ReWOO (Reasoning Without Observation) is an agent framework that decouples planning from execution. A planner language model first generates a complete, abstract plan of reasoning steps and tool calls without executing them. This plan, or “reasoning blueprint,” is then passed to separate worker modules that perform the actual tool executions and observations independently, without further model inference. This separation eliminates the costly back-and-forth between the LLM and tools seen in frameworks like ReAct.
Glossary
ReWOO (Reasoning Without Observation)

What is ReWOO (Reasoning Without Observation)?
ReWOO is an agent framework that decouples reasoning from execution to improve efficiency and reduce costs.
The architecture significantly reduces token consumption and latency by requiring only one or two LLM calls total. It enhances modularity and reliability, as the plan can be validated and the workers can operate deterministically. ReWOO is foundational for building scalable, cost-effective autonomous agents that perform complex, multi-step tasks like data analysis and API orchestration. It represents a shift from interleaved to batched reasoning.
Key Features and Benefits of ReWOO
ReWOO (Reasoning Without Observation) is an agent framework that decouples planning from execution. Its architecture provides distinct advantages in cost, reliability, and scalability for complex, multi-step tasks.
Decoupled Planning & Execution
The core innovation of ReWOO is the strict separation of the planning phase from the execution phase. A planner language model (e.g., GPT-4) first analyzes a user query and generates a complete, abstract plan consisting of:
- Reasoning steps: The logical decomposition of the problem.
- Tool calls: Specific actions to be taken, with their required parameters.
- Variable dependencies: How information flows between steps.
This plan is then passed to separate, lightweight worker modules that execute the tool calls (e.g., API calls, code execution, database queries) without further LLM inference. This eliminates the need for the LLM to 'observe' intermediate results during execution, reducing latency and cost.
Dramatic Reduction in Token Cost
By generating a full plan upfront, ReWOO minimizes the number of expensive LLM calls. Traditional agent frameworks like ReAct interleave reasoning and acting, requiring the LLM to process the entire history of actions and observations repeatedly, leading to long, costly context windows.
ReWOO's efficiency comes from:
- Single planning call: The LLM is invoked once to create the plan.
- Compact execution: Workers handle tool calls, which typically involve cheap, deterministic compute.
- No observational tokens: The LLM does not need to re-process lengthy tool outputs. Research indicates this can reduce token consumption by over 70% for complex tasks compared to ReAct-style agents.
Enhanced Reliability & Determinism
The decoupled architecture introduces several reliability benefits:
- Predictable execution: The plan serves as a deterministic blueprint. Workers follow precise instructions, reducing the variability inherent in LLM-generated intermediate steps.
- Error isolation: Failures in tool execution are contained to specific workers and can be retried or handled without corrupting the LLM's reasoning state.
- Formal verification potential: The explicit plan structure allows for pre-execution validation. Systems can check for logical consistency, missing parameters, or unsafe tool calls before any execution begins.
- Structured logging: The entire plan and its execution trace are easily logged and audited, providing clear visibility into the agent's decision-making process for debugging and compliance.
Scalable Parallel Execution
Because the plan explicitly defines dependencies between steps, independent tasks can be identified and executed in parallel. A scheduler can analyze the plan's directed acyclic graph (DAG) and dispatch tool calls to worker pools concurrently where no data dependency exists.
This is a major advantage over sequential frameworks, leading to significant reductions in total task latency. For example, if an agent needs to fetch weather data from one API and stock prices from another, these independent calls can be made simultaneously, cutting the response time nearly in half.
Modular & Swappable Components
ReWOO promotes a clean, modular system design:
- Planner LLM: Can be swapped for different models (e.g., from GPT-4 to Claude 3) based on cost or planning capability needs without changing the execution engine.
- Worker Modules: Are specialized, single-purpose functions. New tools (calculators, search APIs, internal databases) can be added by simply registering a new worker, without retraining or modifying the planner.
- Scheduler & Memory: Can be upgraded independently (e.g., from a simple linear scheduler to a more sophisticated DAG-based one) to optimize throughput.
This separation of concerns makes the system easier to develop, test, and maintain in production environments.
Contrast with ReAct & Plan-and-Execute
ReWOO occupies a distinct point in the agent design space:
- vs. ReAct (Reasoning + Acting): ReAct is interleaved; the LLM reasons, acts, observes the result, and then reasons again. This is flexible but token-expensive and slower. ReWOO is decoupled; it plans fully first, then acts.
- vs. Simple Plan-and-Execute: Naive planning often produces a high-level list of goals ("1. Search web, 2. Analyze results"). ReWOO generates a detailed, executable plan with specific tool signatures and data flow, which is far more actionable for workers.
Key differentiator: ReWOO's planner outputs a programmatic specification, not just a narrative to-do list. This bridges the gap between LLM-based reasoning and deterministic software execution.
ReWOO vs. ReAct: A Technical Comparison
A feature-by-feature comparison of two prominent agentic reasoning frameworks, highlighting their architectural differences, performance characteristics, and suitability for various production use cases.
| Architectural Feature / Metric | ReWOO (Reasoning Without Observation) | ReAct (Reasoning and Acting) |
|---|---|---|
Core Paradigm | Decoupled planning-then-execution | Interleaved reasoning-and-acting loop |
LLM Inference Calls per Task | 1 (Planner) + N (Worker executions) | N+ (Interleaved per step, often >N) |
Token Efficiency | Higher (single plan generation) | Lower (repeated context of reasoning + actions) |
Latency Profile | Predictable, parallelizable execution | Sequential, dependent on LLM per step |
External Tool / API Integration | Delegated to separate workers | Directly interleaved in LLM response |
Error Handling & Recovery | Plan-level validation; failed worker steps can be retried independently | Requires re-prompting the LLM within the loop, context rebuild |
Observability & Debugging | Clear separation: inspect plan, then worker logs | Tightly coupled; trace interleaves reasoning text and actions |
Scalability for Complex Tasks | High (workers execute in parallel, plan is a blueprint) | Moderate (sequential bottleneck, context window limits) |
Example Use Case | Complex data pipeline with multiple API dependencies | Interactive task requiring dynamic, step-by-step environment feedback |
Frequently Asked Questions
ReWOO (Reasoning Without Observation) is an agent framework that decouples planning from execution. These questions address its core mechanisms, advantages, and practical applications for engineers.
ReWOO (Reasoning Without Observation) is an agent framework that decouples the reasoning/planning phase from the tool execution/observation phase to reduce latency and cost. It works through a three-stage process:
- Planner: A large language model (LLM) receives a user query and generates a complete, abstract plan called a Working Plan. This plan is a sequence of Thoughts (reasoning steps) and Actions (tool calls with predicted arguments), but crucially, it is created without executing any tools or observing their results.
- Worker(s): Separate, lightweight execution workers parse the Working Plan and execute all the specified tool/API calls in parallel or sequentially, as dictated by the plan's dependencies. These workers do not require LLM inference.
- Solver: The results from all executed Actions are compiled and fed back to the Planner LLM. The model then uses these observations to synthesize a final answer for the user, based on the now-completed reasoning chain.
This separation allows the expensive LLM to reason only twice (for planning and solving), while many cheap, parallelizable tool calls happen in between, significantly improving efficiency over interleaved frameworks like ReAct.
Enabling Efficiency, Speed & Accuracy
Intelligent Analysis, Decision & Execution
We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.
Talk to Us
Search across company data
Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.
Useful when people spend too long searching or get different answers from different systems.

Automate internal workflows
Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.
Useful when repetitive work moves across multiple tools and teams.

Add AI to products and internal tools
Build assistants, guided actions, or decision support into the software your team or customers already use.
Useful when AI needs to be part of the product, not a separate tool.
Related Terms
ReWOO is a key design pattern within the broader field of agentic cognitive architectures. These related concepts explore different approaches to planning, reasoning, and tool execution.
ReAct (Reasoning and Acting)
ReAct is a framework that interleaves verbalized reasoning traces with actionable steps, enabling a language model to perform dynamic reasoning while interacting with external tools in a single, integrated loop.
- Key Contrast: Unlike ReWOO's decoupled plan-then-execute approach, ReAct interleaves reasoning and action within a single inference call.
- Process: The model generates a thought (e.g., 'I need to search for the current weather'), then an action (e.g.,
search('weather in London')), observes the result, and repeats. - Use Case: Ideal for exploratory tasks where the optimal plan cannot be fully determined upfront and must adapt to real-time observations.
Tree-of-Thoughts (ToT)
Tree-of-Thoughts is a reasoning framework that extends Chain-of-Thought by exploring multiple reasoning paths in parallel, forming a search tree of intermediate steps.
- Core Mechanism: The language model generates several potential next steps for a problem, evaluates them, and uses search algorithms (e.g., breadth-first, depth-first) to explore promising branches.
- Relation to ReWOO: Both separate planning from answer generation. ToT focuses on exploring a space of reasoning steps, while ReWOO focuses on planning a sequence of tool calls.
- Application: Best suited for complex problems with multiple valid solution paths, such as strategic game playing or creative writing.
Program-Aided Language Models (PAL)
Program-Aided Language Models is a technique where a language model generates reasoning steps as executable code (e.g., Python), which is then run by an external interpreter to compute the final answer.
- Execution Decoupling: Similar to ReWOO, PAL decouples logical planning (code generation) from deterministic execution (code runtime).
- Key Difference: PAL's 'tool' is a general-purpose code interpreter, while ReWOO plans for diverse, specific external APIs and tools.
- Strength: Provides deterministic computational accuracy for mathematical and algorithmic problems, offloading precise calculation from the LLM.
Automated Planning Systems
Automated Planning is a field of AI focused on algorithms that generate sequences of actions (plans) to achieve a specified goal, given a description of the starting state and available actions.
- Classical Foundation: ReWOO implements a language model-based planner within this classical paradigm. The LLM acts as the planning algorithm, producing a sequence of tool calls (actions).
- Formalisms: Often uses representations like STRIPS or PDDL (Planning Domain Definition Language). ReWOO adapts this using natural language.
- System Benefit: Decoupling allows the generated plan to be validated, optimized, or executed by specialized, non-LLM systems for reliability.
Hierarchical Task Networks (HTN)
Hierarchical Task Networks are a planning methodology where complex high-level tasks are recursively decomposed into simpler subtasks until primitive, executable actions are reached.
- Structural Parallel: ReWOO's planner effectively performs a form of HTN decomposition, breaking a user query into a linear sequence of tool-using subtasks.
- Composition: Both approaches rely on a library of methods (for HTN) or tool descriptions (for ReWOO) to perform this decomposition.
- Engineering Value: Provides a structured, auditable blueprint for agent execution, crucial for debugging and validating complex multi-step processes.
Tool-Augmented Reasoning
Tool-Augmented Reasoning is the broad approach of enhancing a language model's capabilities by allowing it to call external tools (calculators, APIs, databases) during its reasoning process.
- Umbrella Category: ReWOO, ReAct, and PAL are all specific architectures under this category.
- Design Spectrum: Ranges from tightly interleaved (ReAct) to fully decoupled (ReWOO). The choice trades off flexibility against cost and latency.
- Key Advantage: Overcomes inherent LLM limitations in areas like factual retrieval, precise calculation, and real-time data access.

About the author
Prasad Kumkar
CEO & MD, Inference Systems
Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.
His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.
Partnered with leading AI, data, and software stack.
How We Work
Custom AI workflows for your Business
One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.
01
Review the use case
We understand the task, the users, and where AI can actually help.
Read more02
Pick the right approach
We define what needs search, automation, or product integration.
Read more03
Build the first useful version
We implement the part that proves the value first.
Read more04
Improve from there
We add the checks and visibility needed to keep it useful.
Read moreThe first call is a practical review of your use case and the right next step.
Talk to Us