Inferensys

Glossary

ReWOO (Reasoning Without Observation)

ReWOO is an agent framework that decouples planning from execution, where a language model creates a complete plan of reasoning steps and tool calls, which are then executed by separate workers without further model inference.
Developer demonstrating multi-agent tool use, agent tool selection interface on laptop, casual tech demo moment.
AGENTIC COGNITIVE ARCHITECTURE

What is ReWOO (Reasoning Without Observation)?

ReWOO is an agent framework that decouples reasoning from execution to improve efficiency and reduce costs.

ReWOO (Reasoning Without Observation) is an agent framework that decouples planning from execution. A planner language model first generates a complete, abstract plan of reasoning steps and tool calls without executing them. This plan, or “reasoning blueprint,” is then passed to separate worker modules that perform the actual tool executions and observations independently, without further model inference. This separation eliminates the costly back-and-forth between the LLM and tools seen in frameworks like ReAct.

The architecture significantly reduces token consumption and latency by requiring only one or two LLM calls total. It enhances modularity and reliability, as the plan can be validated and the workers can operate deterministically. ReWOO is foundational for building scalable, cost-effective autonomous agents that perform complex, multi-step tasks like data analysis and API orchestration. It represents a shift from interleaved to batched reasoning.

AGENT FRAMEWORK

Key Features and Benefits of ReWOO

ReWOO (Reasoning Without Observation) is an agent framework that decouples planning from execution. Its architecture provides distinct advantages in cost, reliability, and scalability for complex, multi-step tasks.

01

Decoupled Planning & Execution

The core innovation of ReWOO is the strict separation of the planning phase from the execution phase. A planner language model (e.g., GPT-4) first analyzes a user query and generates a complete, abstract plan consisting of:

  • Reasoning steps: The logical decomposition of the problem.
  • Tool calls: Specific actions to be taken, with their required parameters.
  • Variable dependencies: How information flows between steps.

This plan is then passed to separate, lightweight worker modules that execute the tool calls (e.g., API calls, code execution, database queries) without further LLM inference. This eliminates the need for the LLM to 'observe' intermediate results during execution, reducing latency and cost.

02

Dramatic Reduction in Token Cost

By generating a full plan upfront, ReWOO minimizes the number of expensive LLM calls. Traditional agent frameworks like ReAct interleave reasoning and acting, requiring the LLM to process the entire history of actions and observations repeatedly, leading to long, costly context windows.

ReWOO's efficiency comes from:

  • Single planning call: The LLM is invoked once to create the plan.
  • Compact execution: Workers handle tool calls, which typically involve cheap, deterministic compute.
  • No observational tokens: The LLM does not need to re-process lengthy tool outputs. Research indicates this can reduce token consumption by over 70% for complex tasks compared to ReAct-style agents.
03

Enhanced Reliability & Determinism

The decoupled architecture introduces several reliability benefits:

  • Predictable execution: The plan serves as a deterministic blueprint. Workers follow precise instructions, reducing the variability inherent in LLM-generated intermediate steps.
  • Error isolation: Failures in tool execution are contained to specific workers and can be retried or handled without corrupting the LLM's reasoning state.
  • Formal verification potential: The explicit plan structure allows for pre-execution validation. Systems can check for logical consistency, missing parameters, or unsafe tool calls before any execution begins.
  • Structured logging: The entire plan and its execution trace are easily logged and audited, providing clear visibility into the agent's decision-making process for debugging and compliance.
04

Scalable Parallel Execution

Because the plan explicitly defines dependencies between steps, independent tasks can be identified and executed in parallel. A scheduler can analyze the plan's directed acyclic graph (DAG) and dispatch tool calls to worker pools concurrently where no data dependency exists.

This is a major advantage over sequential frameworks, leading to significant reductions in total task latency. For example, if an agent needs to fetch weather data from one API and stock prices from another, these independent calls can be made simultaneously, cutting the response time nearly in half.

05

Modular & Swappable Components

ReWOO promotes a clean, modular system design:

  • Planner LLM: Can be swapped for different models (e.g., from GPT-4 to Claude 3) based on cost or planning capability needs without changing the execution engine.
  • Worker Modules: Are specialized, single-purpose functions. New tools (calculators, search APIs, internal databases) can be added by simply registering a new worker, without retraining or modifying the planner.
  • Scheduler & Memory: Can be upgraded independently (e.g., from a simple linear scheduler to a more sophisticated DAG-based one) to optimize throughput.

This separation of concerns makes the system easier to develop, test, and maintain in production environments.

06

Contrast with ReAct & Plan-and-Execute

ReWOO occupies a distinct point in the agent design space:

  • vs. ReAct (Reasoning + Acting): ReAct is interleaved; the LLM reasons, acts, observes the result, and then reasons again. This is flexible but token-expensive and slower. ReWOO is decoupled; it plans fully first, then acts.
  • vs. Simple Plan-and-Execute: Naive planning often produces a high-level list of goals ("1. Search web, 2. Analyze results"). ReWOO generates a detailed, executable plan with specific tool signatures and data flow, which is far more actionable for workers.

Key differentiator: ReWOO's planner outputs a programmatic specification, not just a narrative to-do list. This bridges the gap between LLM-based reasoning and deterministic software execution.

AGENT FRAMEWORK ARCHITECTURE

ReWOO vs. ReAct: A Technical Comparison

A feature-by-feature comparison of two prominent agentic reasoning frameworks, highlighting their architectural differences, performance characteristics, and suitability for various production use cases.

Architectural Feature / MetricReWOO (Reasoning Without Observation)ReAct (Reasoning and Acting)

Core Paradigm

Decoupled planning-then-execution

Interleaved reasoning-and-acting loop

LLM Inference Calls per Task

1 (Planner) + N (Worker executions)

N+ (Interleaved per step, often >N)

Token Efficiency

Higher (single plan generation)

Lower (repeated context of reasoning + actions)

Latency Profile

Predictable, parallelizable execution

Sequential, dependent on LLM per step

External Tool / API Integration

Delegated to separate workers

Directly interleaved in LLM response

Error Handling & Recovery

Plan-level validation; failed worker steps can be retried independently

Requires re-prompting the LLM within the loop, context rebuild

Observability & Debugging

Clear separation: inspect plan, then worker logs

Tightly coupled; trace interleaves reasoning text and actions

Scalability for Complex Tasks

High (workers execute in parallel, plan is a blueprint)

Moderate (sequential bottleneck, context window limits)

Example Use Case

Complex data pipeline with multiple API dependencies

Interactive task requiring dynamic, step-by-step environment feedback

REWOO

Frequently Asked Questions

ReWOO (Reasoning Without Observation) is an agent framework that decouples planning from execution. These questions address its core mechanisms, advantages, and practical applications for engineers.

ReWOO (Reasoning Without Observation) is an agent framework that decouples the reasoning/planning phase from the tool execution/observation phase to reduce latency and cost. It works through a three-stage process:

  1. Planner: A large language model (LLM) receives a user query and generates a complete, abstract plan called a Working Plan. This plan is a sequence of Thoughts (reasoning steps) and Actions (tool calls with predicted arguments), but crucially, it is created without executing any tools or observing their results.
  2. Worker(s): Separate, lightweight execution workers parse the Working Plan and execute all the specified tool/API calls in parallel or sequentially, as dictated by the plan's dependencies. These workers do not require LLM inference.
  3. Solver: The results from all executed Actions are compiled and fed back to the Planner LLM. The model then uses these observations to synthesize a final answer for the user, based on the now-completed reasoning chain.

This separation allows the expensive LLM to reason only twice (for planning and solving), while many cheap, parallelizable tool calls happen in between, significantly improving efficiency over interleaved frameworks like ReAct.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.