Inferensys

Glossary

Plan Repair

Plan repair is the process of modifying an existing plan that has failed during execution due to unexpected state changes, often using local modifications instead of full re-planning.
Product manager reviewing autonomous task execution dashboard on laptop, completed tasks visible, casual work session.
AUTOMATED PLANNING SYSTEMS

What is Plan Repair?

Plan repair, also known as replanning, is a core capability in automated planning systems where an existing plan is modified after it fails during execution, often due to unexpected changes in the environment.

Plan repair is the process of modifying an existing, failing plan during execution, typically using local modifications instead of generating a new plan from scratch. This approach, also called replanning, is crucial for autonomous agents operating in dynamic, uncertain environments where the initial world state assumptions can become invalid. The goal is to efficiently produce a corrected plan that achieves the original objectives from the new, unexpected state, minimizing disruption and computational cost compared to full re-planning.

Effective plan repair strategies leverage the structure of the original plan and the nature of the failure. Techniques range from simple patches, like reordering or substituting actions, to more sophisticated methods that reason about causal links and landmarks. This process is a key component of robust execution monitoring within agentic cognitive architectures, enabling systems to recover from setbacks and continue pursuing complex, multi-step goals autonomously.

AUTOMATED PLANNING SYSTEMS

Core Characteristics of Plan Repair

Plan repair, or replanning, is the process of modifying an existing plan that has failed during execution due to unexpected state changes, often using local modifications instead of full re-planning.

01

Localized Modification

Plan repair focuses on making minimal changes to an existing, failed plan rather than discarding it and starting from scratch. This involves identifying the specific point of failure and adjusting subsequent actions to accommodate the new world state. The core principle is incrementalism: preserving as much of the original, valid plan structure as possible to conserve computational effort and maintain plan stability. For example, if a delivery robot finds a door locked, a repair algorithm might insert a 'request key' action rather than re-planning the entire route from the warehouse.

02

Execution Monitoring & Failure Detection

Repair is triggered by a discrepancy between the expected state (as predicted by the plan's effects) and the observed state during execution. This requires continuous execution monitoring to detect failures such as:

  • Precondition violations: An action's required conditions are not met.
  • Unexpected state changes: External events alter the world independently of the agent's actions.
  • Action execution failures: An action is attempted but does not produce its intended effects. The monitoring system compares sensor readings or state assertions against the plan's timeline to identify the precise moment and nature of the failure.
03

Replanning vs. Plan Repair

While often used interchangeably, replanning and plan repair represent different strategies on a spectrum. Classical replanning treats the current, unexpected state as a new initial state and invokes the full planner from scratch, guaranteeing a correct solution but at high computational cost. Plan repair is a more efficient, anytime algorithm that attempts to patch the existing plan. The choice depends on the severity of the failure, time constraints, and domain dynamics. In highly dynamic environments, the speed of repair is often critical, favoring local modifications.

04

Plan-Space Repair

This is a primary algorithmic approach where repair operations are performed directly on the plan structure itself, not by searching the state space. Common repair operators include:

  • Action insertion: Adding a new action to achieve a missing precondition.
  • Action reordering: Changing the sequence of actions to resolve a causal link threat or resource conflict.
  • Action substitution: Replacing a failed action with a different one that achieves the same subgoal.
  • Goal re-establishment: Adding actions to re-achieve a goal fact that was made true earlier but has since become false. These operators are applied iteratively until the plan is consistent and reaches the goal from the current state.
05

Dependency-Directed Repair

This sophisticated technique analyzes the causal structure of the plan to understand why the failure occurred. It builds a dependency graph linking actions through their preconditions and effects. When a failure is detected (e.g., a required precondition is false), the algorithm backtracks through this graph to find the culprit action whose effect was expected but did not materialize, or whose effect was unexpectedly deleted. Repair then focuses on this subgraph, minimizing changes to unrelated parts of the plan. This method is more informed than blind plan-space operators.

06

Integration with Contingency Planning

Robust autonomous systems often combine plan repair with contingency planning. Before execution, the planner may generate a primary plan alongside a set of expected failure modes and pre-computed repair strategies or branching points. During execution, if a monitored failure matches a predicted contingency, the corresponding repair patch can be applied instantly. This hybrid approach blends the efficiency of pre-computation with the flexibility of runtime repair, creating systems that are both robust and responsive. It is essential for domains with known, high-probability uncertainties.

AUTOMATED PLANNING SYSTEMS

How Plan Repair Works

Plan repair, also known as replanning, is the process of modifying an existing action sequence that has failed during execution, typically due to unexpected changes in the environment or action failures.

Plan repair is the dynamic process of modifying a previously generated plan when its execution fails due to an unexpected state deviation or action failure. Instead of discarding the entire plan and initiating a costly full re-planning cycle, repair algorithms attempt to make local modifications—such as inserting, deleting, or reordering actions—to restore the plan's feasibility from the current, altered world state. This approach is more computationally efficient than complete re-planning and is essential for autonomous agents operating in non-deterministic, real-world environments where perfect execution cannot be guaranteed.

Effective plan repair relies on maintaining a causal link structure from the original plan, which records the dependencies between actions and the subgoals they achieve. When a failure is detected, the system identifies the broken causal links—goals that are no longer supported—and searches for minimal patches to re-establish them. Common techniques include least-commitment planning to insert new actions and partial-order causal link (POCL) planning to resolve threats. The goal is to produce a valid plan that achieves the original objectives from the new current state with minimal disruption to the remaining, still-valid plan steps.

PLAN REPAIR

Frequently Asked Questions

Plan repair, also known as replanning, is a critical capability for autonomous systems operating in dynamic environments. This FAQ addresses common technical questions about the mechanisms, trade-offs, and applications of modifying plans during execution.

Plan repair, or replanning, is the process of modifying an existing action sequence that has failed or become suboptimal during execution due to unexpected changes in the environment. It works by detecting a discrepancy between the expected and observed world state, then applying algorithms to locally adjust the plan—such as removing invalid actions, reordering steps, or inserting new corrective actions—instead of initiating a computationally expensive full re-planning cycle from scratch. Common techniques include plan-space planning and the use of execution monitors to trigger repair when preconditions are violated.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.