Free 30-minute system review for production AI teams

Guides on retrieval, evaluation, orchestration, and production AI delivery

Need help designing, building, or shipping a production AI system?

Get in touch

Compare architectures, tradeoffs, and implementation paths

See comparisons

Free 30-minute system review for production AI teams

Book a call

Guides on retrieval, evaluation, orchestration, and production AI delivery

Browse guides

Need help designing, building, or shipping a production AI system?

Get in touch

Compare architectures, tradeoffs, and implementation paths

See comparisons

Iterated Amplification: AI Alignment & Scalable Oversight | Inference Systems

Reference

Iterated Amplification

Iterated Amplification is an AI alignment proposal where a weak supervisor oversees an AI system assisting with a task, the AI's assistance amplifies the supervisor's capabilities, and this process is iterated to oversee tasks of increasing complexity.

Command center environment coordinating high-volume workflows across multiple systems.

AI ALIGNMENT PROPOSAL

What is Iterated Amplification?

Iterated Amplification is a technical proposal for aligning advanced artificial intelligence systems with complex human values through an iterative, bootstrapping process of oversight.

Iterated Amplification (IDA) is an AI alignment proposal where a human supervisor iteratively trains an AI assistant by decomposing complex tasks into simpler subtasks they can oversee, using the AI's own assistance to amplify their capabilities at each step. The core mechanism is a distillation process: the AI learns to imitate the outputs of the amplified human-AI team, creating a new, more capable agent. This cycle repeats, theoretically allowing the supervision of tasks far beyond unaided human ability, as each iteration bootstraps from the previous level of competency.

The proposal directly addresses the problem of scalable oversight, where human evaluators cannot directly judge an AI's performance on extremely complex tasks. It is conceptually related to Debate and Recursive Self-Improvement (RSI), but focuses on value alignment rather than capability gain alone. A key technical challenge is ensuring the distilled model robustly generalizes the supervisor's underlying intent, not just the surface-level behavior, to avoid ontology identification problems where the AI misinterprets the true goal.

ARCHITECTURAL COMPONENTS

Core Mechanisms of Iterated Amplification

Iterated Amplification is an AI alignment and capability amplification framework built on a recursive, human-in-the-loop process. Its core mechanisms define how a weak supervisor can iteratively oversee and train an AI system on tasks of escalating complexity.

The Amplification Loop

The fundamental recursive process where a human supervisor delegates subtasks of a complex problem to an AI assistant. The AI's solutions are then synthesized by the supervisor into a final answer. This combined output demonstrates a capability greater than the supervisor's alone. The process is defined by the equation: Amp(H) = H + AI(H), where the amplified human Amp(H) can solve problems initially beyond H's unaided capacity.

Decomposition: The supervisor breaks a task into smaller, manageable pieces.
Assistance: The AI solves these sub-problems.
Synthesis: The supervisor integrates the AI's outputs into a coherent solution.
Iteration: The resulting Amp(H) becomes the new supervisor for the next, more complex iteration.

ITERATED AMPLIFICATION

Frequently Asked Questions

Iterated Amplification is a foundational AI alignment proposal for overseeing systems that can perform tasks beyond direct human comprehension. This FAQ addresses its core mechanisms, relationship to other techniques, and practical implications.

Iterated Amplification (IA) is a proposed AI alignment technique where a human supervisor, initially only capable of judging small, manageable pieces of a complex task, iteratively trains an AI assistant by providing feedback; the AI's assistance then amplifies the supervisor's capability, allowing them to oversee increasingly complex tasks in a bootstrapped process. The core goal is to develop a reliable oversight mechanism for AI systems whose problem-solving abilities may eventually surpass direct human understanding, ensuring the AI's behavior remains aligned with human intent even on tasks too intricate for unaided human evaluation.

Iterated Amplification

What is Iterated Amplification?

Core Mechanisms of Iterated Amplification

The Amplification Loop

Frequently Asked Questions

Scalable Oversight

Distillation & Imitation Learning

Oversight & The HCH Sequence

Decomposition & Task Specification

Iterative Expansion of Complexity

Contrast with Reinforcement Learning from Human Feedback (RLHF)

Debate

Recursive Reward Modeling

Corrigibility

Human-in-the-Loop (HITL)

AI Safety via Debate

Iterated Amplification

What is Iterated Amplification?

Core Mechanisms of Iterated Amplification

The Amplification Loop

Frequently Asked Questions

Related Terms

Scalable Oversight

Distillation & Imitation Learning

Oversight & The HCH Sequence

Decomposition & Task Specification

Iterative Expansion of Complexity

Contrast with Reinforcement Learning from Human Feedback (RLHF)

Debate

Recursive Reward Modeling

Corrigibility

Human-in-the-Loop (HITL)

AI Safety via Debate