Glossary

ReWOO (Reasoning Without Observation)

ReWOO is an agent framework that decouples planning from execution, where a language model creates a complete plan of reasoning steps and tool calls, which are then executed by separate workers without further model inference.

Get in touch Learn more

Developer demonstrating multi-agent tool use, agent tool selection interface on laptop, casual tech demo moment.

AGENTIC COGNITIVE ARCHITECTURE

What is ReWOO (Reasoning Without Observation)?

ReWOO is an agent framework that decouples reasoning from execution to improve efficiency and reduce costs.

ReWOO (Reasoning Without Observation) is an agent framework that decouples planning from execution. A planner language model first generates a complete, abstract plan of reasoning steps and tool calls without executing them. This plan, or “reasoning blueprint,” is then passed to separate worker modules that perform the actual tool executions and observations independently, without further model inference. This separation eliminates the costly back-and-forth between the LLM and tools seen in frameworks like ReAct.

The architecture significantly reduces token consumption and latency by requiring only one or two LLM calls total. It enhances modularity and reliability, as the plan can be validated and the workers can operate deterministically. ReWOO is foundational for building scalable, cost-effective autonomous agents that perform complex, multi-step tasks like data analysis and API orchestration. It represents a shift from interleaved to batched reasoning.

AGENT FRAMEWORK

Key Features and Benefits of ReWOO

ReWOO (Reasoning Without Observation) is an agent framework that decouples planning from execution. Its architecture provides distinct advantages in cost, reliability, and scalability for complex, multi-step tasks.

Decoupled Planning & Execution

The core innovation of ReWOO is the strict separation of the planning phase from the execution phase. A planner language model (e.g., GPT-4) first analyzes a user query and generates a complete, abstract plan consisting of:

Reasoning steps: The logical decomposition of the problem.
Tool calls: Specific actions to be taken, with their required parameters.
Variable dependencies: How information flows between steps.

This plan is then passed to separate, lightweight worker modules that execute the tool calls (e.g., API calls, code execution, database queries) without further LLM inference. This eliminates the need for the LLM to 'observe' intermediate results during execution, reducing latency and cost.

Dramatic Reduction in Token Cost

By generating a full plan upfront, ReWOO minimizes the number of expensive LLM calls. Traditional agent frameworks like ReAct interleave reasoning and acting, requiring the LLM to process the entire history of actions and observations repeatedly, leading to long, costly context windows.

ReWOO's efficiency comes from:

Single planning call: The LLM is invoked once to create the plan.
Compact execution: Workers handle tool calls, which typically involve cheap, deterministic compute.
No observational tokens: The LLM does not need to re-process lengthy tool outputs. Research indicates this can reduce token consumption by over 70% for complex tasks compared to ReAct-style agents.

Enhanced Reliability & Determinism

The decoupled architecture introduces several reliability benefits:

Predictable execution: The plan serves as a deterministic blueprint. Workers follow precise instructions, reducing the variability inherent in LLM-generated intermediate steps.
Error isolation: Failures in tool execution are contained to specific workers and can be retried or handled without corrupting the LLM's reasoning state.
Formal verification potential: The explicit plan structure allows for pre-execution validation. Systems can check for logical consistency, missing parameters, or unsafe tool calls before any execution begins.
Structured logging: The entire plan and its execution trace are easily logged and audited, providing clear visibility into the agent's decision-making process for debugging and compliance.

Scalable Parallel Execution

Because the plan explicitly defines dependencies between steps, independent tasks can be identified and executed in parallel. A scheduler can analyze the plan's directed acyclic graph (DAG) and dispatch tool calls to worker pools concurrently where no data dependency exists.

This is a major advantage over sequential frameworks, leading to significant reductions in total task latency. For example, if an agent needs to fetch weather data from one API and stock prices from another, these independent calls can be made simultaneously, cutting the response time nearly in half.

Modular & Swappable Components

ReWOO promotes a clean, modular system design:

Planner LLM: Can be swapped for different models (e.g., from GPT-4 to Claude 3) based on cost or planning capability needs without changing the execution engine.
Worker Modules: Are specialized, single-purpose functions. New tools (calculators, search APIs, internal databases) can be added by simply registering a new worker, without retraining or modifying the planner.
Scheduler & Memory: Can be upgraded independently (e.g., from a simple linear scheduler to a more sophisticated DAG-based one) to optimize throughput.

This separation of concerns makes the system easier to develop, test, and maintain in production environments.

Contrast with ReAct & Plan-and-Execute

ReWOO occupies a distinct point in the agent design space:

vs. ReAct (Reasoning + Acting): ReAct is interleaved; the LLM reasons, acts, observes the result, and then reasons again. This is flexible but token-expensive and slower. ReWOO is decoupled; it plans fully first, then acts.
vs. Simple Plan-and-Execute: Naive planning often produces a high-level list of goals ("1. Search web, 2. Analyze results"). ReWOO generates a detailed, executable plan with specific tool signatures and data flow, which is far more actionable for workers.

Key differentiator: ReWOO's planner outputs a programmatic specification, not just a narrative to-do list. This bridges the gap between LLM-based reasoning and deterministic software execution.

AGENT FRAMEWORK ARCHITECTURE

ReWOO vs. ReAct: A Technical Comparison

A feature-by-feature comparison of two prominent agentic reasoning frameworks, highlighting their architectural differences, performance characteristics, and suitability for various production use cases.

Architectural Feature / Metric	ReWOO (Reasoning Without Observation)	ReAct (Reasoning and Acting)
Core Paradigm	Decoupled planning-then-execution	Interleaved reasoning-and-acting loop
LLM Inference Calls per Task	1 (Planner) + N (Worker executions)	N+ (Interleaved per step, often >N)
Token Efficiency	Higher (single plan generation)	Lower (repeated context of reasoning + actions)
Latency Profile	Predictable, parallelizable execution	Sequential, dependent on LLM per step
External Tool / API Integration	Delegated to separate workers	Directly interleaved in LLM response
Error Handling & Recovery	Plan-level validation; failed worker steps can be retried independently	Requires re-prompting the LLM within the loop, context rebuild
Observability & Debugging	Clear separation: inspect plan, then worker logs	Tightly coupled; trace interleaves reasoning text and actions
Scalability for Complex Tasks	High (workers execute in parallel, plan is a blueprint)	Moderate (sequential bottleneck, context window limits)
Example Use Case	Complex data pipeline with multiple API dependencies	Interactive task requiring dynamic, step-by-step environment feedback

REWOO

Frequently Asked Questions

ReWOO (Reasoning Without Observation) is an agent framework that decouples planning from execution. These questions address its core mechanisms, advantages, and practical applications for engineers.

ReWOO (Reasoning Without Observation) is an agent framework that decouples the reasoning/planning phase from the tool execution/observation phase to reduce latency and cost. It works through a three-stage process:

Planner: A large language model (LLM) receives a user query and generates a complete, abstract plan called a Working Plan. This plan is a sequence of Thoughts (reasoning steps) and Actions (tool calls with predicted arguments), but crucially, it is created without executing any tools or observing their results.
Worker(s): Separate, lightweight execution workers parse the Working Plan and execute all the specified tool/API calls in parallel or sequentially, as dictated by the plan's dependencies. These workers do not require LLM inference.
Solver: The results from all executed Actions are compiled and fed back to the Planner LLM. The model then uses these observations to synthesize a final answer for the user, based on the now-completed reasoning chain.

This separation allows the expensive LLM to reason only twice (for planning and solving), while many cheap, parallelizable tool calls happen in between, significantly improving efficiency over interleaved frameworks like ReAct.

Enabling Efficiency, Speed & Accuracy

Intelligent Analysis, Decision & Execution

We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.

Talk to Us

Search across company data

Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.

Useful when people spend too long searching or get different answers from different systems.

Enterprise searchRAGPermissions

Automate internal workflows

Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.

Useful when repetitive work moves across multiple tools and teams.

AI agentsWorkflow automationGovernance

Add AI to products and internal tools

Build assistants, guided actions, or decision support into the software your team or customers already use.

Useful when AI needs to be part of the product, not a separate tool.

AI integrationDecision supportModel routing

AGENTIC COGNITIVE ARCHITECTURES

Related Terms

ReWOO is a key design pattern within the broader field of agentic cognitive architectures. These related concepts explore different approaches to planning, reasoning, and tool execution.

ReAct (Reasoning and Acting)

ReAct is a framework that interleaves verbalized reasoning traces with actionable steps, enabling a language model to perform dynamic reasoning while interacting with external tools in a single, integrated loop.

Key Contrast: Unlike ReWOO's decoupled plan-then-execute approach, ReAct interleaves reasoning and action within a single inference call.
Process: The model generates a thought (e.g., 'I need to search for the current weather'), then an action (e.g., search('weather in London')), observes the result, and repeats.
Use Case: Ideal for exploratory tasks where the optimal plan cannot be fully determined upfront and must adapt to real-time observations.

Tree-of-Thoughts (ToT)

Tree-of-Thoughts is a reasoning framework that extends Chain-of-Thought by exploring multiple reasoning paths in parallel, forming a search tree of intermediate steps.

Core Mechanism: The language model generates several potential next steps for a problem, evaluates them, and uses search algorithms (e.g., breadth-first, depth-first) to explore promising branches.
Relation to ReWOO: Both separate planning from answer generation. ToT focuses on exploring a space of reasoning steps, while ReWOO focuses on planning a sequence of tool calls.
Application: Best suited for complex problems with multiple valid solution paths, such as strategic game playing or creative writing.

Program-Aided Language Models (PAL)

Program-Aided Language Models is a technique where a language model generates reasoning steps as executable code (e.g., Python), which is then run by an external interpreter to compute the final answer.

Execution Decoupling: Similar to ReWOO, PAL decouples logical planning (code generation) from deterministic execution (code runtime).
Key Difference: PAL's 'tool' is a general-purpose code interpreter, while ReWOO plans for diverse, specific external APIs and tools.
Strength: Provides deterministic computational accuracy for mathematical and algorithmic problems, offloading precise calculation from the LLM.

Automated Planning Systems

Automated Planning is a field of AI focused on algorithms that generate sequences of actions (plans) to achieve a specified goal, given a description of the starting state and available actions.

Classical Foundation: ReWOO implements a language model-based planner within this classical paradigm. The LLM acts as the planning algorithm, producing a sequence of tool calls (actions).
Formalisms: Often uses representations like STRIPS or PDDL (Planning Domain Definition Language). ReWOO adapts this using natural language.
System Benefit: Decoupling allows the generated plan to be validated, optimized, or executed by specialized, non-LLM systems for reliability.

Hierarchical Task Networks (HTN)

Hierarchical Task Networks are a planning methodology where complex high-level tasks are recursively decomposed into simpler subtasks until primitive, executable actions are reached.

Structural Parallel: ReWOO's planner effectively performs a form of HTN decomposition, breaking a user query into a linear sequence of tool-using subtasks.
Composition: Both approaches rely on a library of methods (for HTN) or tool descriptions (for ReWOO) to perform this decomposition.
Engineering Value: Provides a structured, auditable blueprint for agent execution, crucial for debugging and validating complex multi-step processes.

Tool-Augmented Reasoning

Tool-Augmented Reasoning is the broad approach of enhancing a language model's capabilities by allowing it to call external tools (calculators, APIs, databases) during its reasoning process.

Umbrella Category: ReWOO, ReAct, and PAL are all specific architectures under this category.
Design Spectrum: Ranges from tightly interleaved (ReAct) to fully decoupled (ReWOO). The choice trades off flexibility against cost and latency.
Key Advantage: Overcomes inherent LLM limitations in areas like factual retrieval, precise calculation, and real-time data access.

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.

Limited slotsGet a Free AI Consultation

How We Work

Custom AI workflows for your Business

One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.