Glossary

Self-Critique

Self-Critique is a prompting technique where a language model is instructed to review and evaluate its own initial output or reasoning chain, identifying potential errors, inconsistencies, or areas for improvement before producing a final, refined answer.

Get in touch Learn more

ML engineer working on model compression and quantization, laptop showing performance benchmarks, technical workspace.

AGENTIC COGNITIVE ARCHITECTURES

What is Self-Critique?

A prompting technique within Chain-of-Thought reasoning where a language model reviews and refines its own output.

Self-Critique is a prompting technique where a language model is instructed to review, evaluate, and improve its own initial output or reasoning chain. The model acts as its own critic, identifying potential errors, logical inconsistencies, or areas for improvement before generating a final, refined answer. This creates a simple, single-agent feedback loop, enhancing output reliability without external verification. It is a foundational method within agentic cognitive architectures for building self-correcting systems.

The technique typically involves a multi-turn prompt where the model first produces an answer, then receives an instruction to critique that answer. Common directives include identifying flawed assumptions, checking factual accuracy, or suggesting more precise phrasing. This process is closely related to Chain-of-Verification (CoVe) and leverages the model's inherent capacity for meta-cognition. When integrated with tool-augmented reasoning, the critique can trigger external fact-checking or calculation, moving beyond purely internal reflection.

AGENTIC COGNITIVE ARCHITECTURES

Core Characteristics of Self-Critique

Iterative Refinement Loop

Self-Critique establishes a closed-loop feedback system within a single model inference session. The core mechanism involves:

Initial Generation: The model produces a first-pass answer or reasoning chain (e.g., a Chain-of-Thought).
Critical Evaluation: The model, following a meta-prompt, switches roles to act as an evaluator, scrutinizing its own output for logical fallacies, factual inaccuracies, or missed assumptions.
Final Synthesis: Using the critique, the model generates a revised, improved final output. This creates a single-agent, multi-turn dialogue that mimics a human revising a draft, significantly improving output quality without external verification.

Explicit Meta-Cognitive Prompting

The technique relies on structured meta-instructions that force the model to adopt a critical perspective. Effective prompts explicitly define the evaluation criteria. For example:

"Review the following solution for calculation errors."
"Identify any unsupported assumptions in the argument below."
"Check the consistency between the stated premises and the final conclusion." This shifts the model from a generative mode to an analytical mode. The prompt architecture is crucial; vague instructions like "Is this good?" yield poor results, while specific, role-based prompts ("Act as a rigorous peer reviewer") elicit meaningful self-assessment.

Error Detection & Hallucination Mitigation

A primary utility of Self-Critique is identifying and correcting model hallucinations and reasoning breakdowns. The model is tasked to flag:

Factual Contradictions: Internal inconsistencies within the generated text.
Unsubstantiated Claims: Statements presented as fact without evidence in the provided context.
Logical Non-Sequiturs: Conclusions that do not follow from the provided reasoning steps.
Mathematical Errors: Mistakes in arithmetic or symbolic manipulation. By surfacing these issues in the critique phase, the final output has higher factual fidelity and logical coherence, making the technique vital for applications requiring high accuracy, such as technical analysis or summarization of complex documents.

Distinction from External Verification

Self-Critique is fundamentally different from using a separate, external model or tool for verification. Key differentiators include:

Single-Model Paradigm: The same model parameters and knowledge base are used for both generation and critique. This is computationally efficient but means the critique is limited by the model's own knowledge and biases.
No Ground Truth Required: Unlike supervised evaluation, the model critiques its own work without access to a canonical correct answer.
Contrast with Self-Consistency: While Self-Consistency samples multiple independent reasoning paths and votes on answers, Self-Critique sequentially refines a single reasoning path. It is complementary to techniques like Chain-of-Verification (CoVe), which is a more structured, multi-step instantiation of the self-critique principle.

Integration with Agentic Workflows

In agentic cognitive architectures, Self-Critique functions as a core reflection module. It is a building block for more complex loops:

Planning Phase: An agent can generate a plan, critique it for feasibility, and then refine it before execution.
Tool-Use Validation: After performing an action or API call, the agent can critique the result's validity before proceeding.
Recursive Error Correction: Failed actions can trigger a self-critique loop to diagnose the cause and adjust strategy. This makes the agent self-correcting and more robust. It is a simpler, more immediate form of reflection compared to Reinforcement Learning from AI Feedback (RLAIF), which requires training a separate reward model.

Limitations and Failure Modes

Self-Critique is not a panacea and has inherent limitations:

Complacent Agreement: The model may fail to identify its own errors, producing a shallow or affirming critique that misses fundamental flaws—a form of confirmation bias.
Knowledge Boundary: The model cannot critique information outside its training data or identify subtle factual errors it itself believes to be true.
Critique Hallucinations: The model may invent problems that don't exist or propose incorrect corrections, potentially degrading the final answer.
Prompt Sensitivity: Performance is highly dependent on the exact phrasing of the critique instruction. Mitigations include prompt ensembling (trying multiple critique prompts) and hybrid approaches that combine self-critique with retrieval-augmented verification for factual grounding.

AGENTIC COGNITIVE ARCHITECTURES

How Self-Critique Works: A Technical Mechanism

The mechanism typically follows a multi-turn prompting sequence. First, the model generates an initial response or Chain-of-Thought (CoT). A subsequent, distinct prompt then instructs the same model to act as a critic or verifier, analyzing the initial output for logical flaws, factual inaccuracies, or missed assumptions. This creates an internal feedback loop, separating the generative and evaluative cognitive modes to reduce confirmation bias and improve output reliability.

Technically, this is implemented by structuring the conversation history. The initial answer becomes context for the critique prompt, which often uses role-playing instructions (e.g., 'You are a meticulous reviewer'). The model's final output synthesizes or is regenerated based on this self-assessment. This process is foundational to agentic architectures like Chain-of-Verification (CoVe) and is a precursor to more automated recursive error correction systems.

CHAIN-OF-THOUGHT REASONING

Frequently Asked Questions

Common questions about Self-Critique, a prompting technique that enhances the reliability of language model outputs by having the model review and refine its own reasoning.

Self-Critique is a prompting technique where a language model is instructed to review and evaluate its own initial output or reasoning chain, identifying potential errors, inconsistencies, or areas for improvement before producing a final, refined answer. It operationalizes a form of metacognition, forcing the model to step back from its initial generation and apply a critical lens. This is typically implemented through a multi-turn prompt structure: the model first generates an answer, then receives an instruction like "Review this answer for logical fallacies, factual inaccuracies, or missed details," and finally produces a revised version. The technique is foundational for building more reliable, auditable, and self-correcting agentic systems without requiring external verification for every step.

Enabling Efficiency, Speed & Accuracy

Intelligent Analysis, Decision & Execution

We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.

Talk to Us

Search across company data

Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.

Useful when people spend too long searching or get different answers from different systems.

Enterprise searchRAGPermissions

Automate internal workflows

Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.

Useful when repetitive work moves across multiple tools and teams.

AI agentsWorkflow automationGovernance

Add AI to products and internal tools

Build assistants, guided actions, or decision support into the software your team or customers already use.

Useful when AI needs to be part of the product, not a separate tool.

AI integrationDecision supportModel routing

CHAIN-OF-THOUGHT REASONING

Related Terms

Self-Critique is a key component of advanced reasoning loops. These related techniques and frameworks enable language models to decompose, verify, and refine their own outputs.

Chain-of-Verification (CoVe)

A structured fact-checking method where a model:

Generates an initial baseline answer.
Plans a set of verification questions to audit its own claims.
Executes those verifications, often via retrieval.
Produces a final, revised answer based on the audit.

Key Difference: While Self-Critique is a single review step, CoVe is a multi-phase, planned verification cycle explicitly focused on factual accuracy.

Process Supervision

A training paradigm where a model receives reward signals or feedback for each individual step in a reasoning chain, not just the final output. This is often implemented using a Process Reward Model (PRM).

Relation to Self-Critique: Process Supervision trains a model's internal critique capability. Self-Critique is the inference-time application of a similar step-wise evaluation skill, often emerging from such training.

Faithfulness Metrics

Quantitative measures that evaluate whether a model's generated reasoning steps are:

Logically consistent with each other.
Factually correct and grounded.
Genuinely causal in leading to the final answer (not post-hoc rationalizations).

Application: These metrics (e.g., entailment-based scores) are used to automatically assess the quality of a model's Self-Critique, determining if the critique is itself faithful.

ReAct (Reasoning + Acting)

A framework that interleaves verbalized reasoning traces with actionable steps (tool/API calls). The model operates in a loop: Thought → Act → Observe.

Critique Integration: In advanced ReAct agents, a Self-Critique step can be inserted into the 'Thought' phase to evaluate the last observation or planned action before execution, leading to more robust and error-aware agents.

Self-Consistency

A decoding strategy that improves reliability by:

Sampling multiple, diverse reasoning paths (Chain-of-Thoughts) for a single problem.
Generating a final answer from each path.
Selecting the most frequent final answer via majority vote.

Contrast with Self-Critique: Self-Consistency aggregates outputs from different reasoning attempts. Self-Critique refines a single reasoning attempt through internal review. They are complementary techniques.

Instructional Scaffolding

A prompt engineering technique that structures a task with graduated hints, decompositions, or meta-instructions to guide a model. A common pattern is to explicitly instruct the model to work in phases: 'First, plan your approach. Second, execute the steps. Third, review your work for errors.'

Foundation for Self-Critique: Explicit scaffolding in a prompt (e.g., 'Critique your initial answer...') is the primary method for eliciting Self-Critique behavior from models not explicitly trained for it.

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.

Limited slotsGet a Free AI Consultation

How We Work

Custom AI workflows for your Business

One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.

Talk to Us

Self-Critique

What is Self-Critique?

Core Characteristics of Self-Critique

Iterative Refinement Loop

Explicit Meta-Cognitive Prompting

Error Detection & Hallucination Mitigation

Distinction from External Verification

Integration with Agentic Workflows

Limitations and Failure Modes

How Self-Critique Works: A Technical Mechanism

Frequently Asked Questions

Intelligent Analysis, Decision & Execution

Search across company data

Automate internal workflows

Add AI to products and internal tools

Prasad Kumkar

Partnered with leading AI, data, and software stack.

Custom AI workflows for your Business

Review the use case

Pick the right approach

Build the first useful version

Improve from there