Inferensys

Glossary

Self-Critique

Self-Critique is a prompting technique where a language model is instructed to review and evaluate its own initial output or reasoning chain, identifying potential errors, inconsistencies, or areas for improvement before producing a final, refined answer.
ML engineer working on model compression and quantization, laptop showing performance benchmarks, technical workspace.
AGENTIC COGNITIVE ARCHITECTURES

What is Self-Critique?

A prompting technique within Chain-of-Thought reasoning where a language model reviews and refines its own output.

Self-Critique is a prompting technique where a language model is instructed to review, evaluate, and improve its own initial output or reasoning chain. The model acts as its own critic, identifying potential errors, logical inconsistencies, or areas for improvement before generating a final, refined answer. This creates a simple, single-agent feedback loop, enhancing output reliability without external verification. It is a foundational method within agentic cognitive architectures for building self-correcting systems.

The technique typically involves a multi-turn prompt where the model first produces an answer, then receives an instruction to critique that answer. Common directives include identifying flawed assumptions, checking factual accuracy, or suggesting more precise phrasing. This process is closely related to Chain-of-Verification (CoVe) and leverages the model's inherent capacity for meta-cognition. When integrated with tool-augmented reasoning, the critique can trigger external fact-checking or calculation, moving beyond purely internal reflection.

AGENTIC COGNITIVE ARCHITECTURES

Core Characteristics of Self-Critique

Self-Critique is a prompting technique where a language model is instructed to review and evaluate its own initial output or reasoning chain, identifying potential errors, inconsistencies, or areas for improvement before producing a final, refined answer. The following cards detail its defining operational characteristics.

01

Iterative Refinement Loop

Self-Critique establishes a closed-loop feedback system within a single model inference session. The core mechanism involves:

  • Initial Generation: The model produces a first-pass answer or reasoning chain (e.g., a Chain-of-Thought).
  • Critical Evaluation: The model, following a meta-prompt, switches roles to act as an evaluator, scrutinizing its own output for logical fallacies, factual inaccuracies, or missed assumptions.
  • Final Synthesis: Using the critique, the model generates a revised, improved final output. This creates a single-agent, multi-turn dialogue that mimics a human revising a draft, significantly improving output quality without external verification.
02

Explicit Meta-Cognitive Prompting

The technique relies on structured meta-instructions that force the model to adopt a critical perspective. Effective prompts explicitly define the evaluation criteria. For example:

  • "Review the following solution for calculation errors."
  • "Identify any unsupported assumptions in the argument below."
  • "Check the consistency between the stated premises and the final conclusion." This shifts the model from a generative mode to an analytical mode. The prompt architecture is crucial; vague instructions like "Is this good?" yield poor results, while specific, role-based prompts ("Act as a rigorous peer reviewer") elicit meaningful self-assessment.
03

Error Detection & Hallucination Mitigation

A primary utility of Self-Critique is identifying and correcting model hallucinations and reasoning breakdowns. The model is tasked to flag:

  • Factual Contradictions: Internal inconsistencies within the generated text.
  • Unsubstantiated Claims: Statements presented as fact without evidence in the provided context.
  • Logical Non-Sequiturs: Conclusions that do not follow from the provided reasoning steps.
  • Mathematical Errors: Mistakes in arithmetic or symbolic manipulation. By surfacing these issues in the critique phase, the final output has higher factual fidelity and logical coherence, making the technique vital for applications requiring high accuracy, such as technical analysis or summarization of complex documents.
04

Distinction from External Verification

Self-Critique is fundamentally different from using a separate, external model or tool for verification. Key differentiators include:

  • Single-Model Paradigm: The same model parameters and knowledge base are used for both generation and critique. This is computationally efficient but means the critique is limited by the model's own knowledge and biases.
  • No Ground Truth Required: Unlike supervised evaluation, the model critiques its own work without access to a canonical correct answer.
  • Contrast with Self-Consistency: While Self-Consistency samples multiple independent reasoning paths and votes on answers, Self-Critique sequentially refines a single reasoning path. It is complementary to techniques like Chain-of-Verification (CoVe), which is a more structured, multi-step instantiation of the self-critique principle.
05

Integration with Agentic Workflows

In agentic cognitive architectures, Self-Critique functions as a core reflection module. It is a building block for more complex loops:

  • Planning Phase: An agent can generate a plan, critique it for feasibility, and then refine it before execution.
  • Tool-Use Validation: After performing an action or API call, the agent can critique the result's validity before proceeding.
  • Recursive Error Correction: Failed actions can trigger a self-critique loop to diagnose the cause and adjust strategy. This makes the agent self-correcting and more robust. It is a simpler, more immediate form of reflection compared to Reinforcement Learning from AI Feedback (RLAIF), which requires training a separate reward model.
06

Limitations and Failure Modes

Self-Critique is not a panacea and has inherent limitations:

  • Complacent Agreement: The model may fail to identify its own errors, producing a shallow or affirming critique that misses fundamental flaws—a form of confirmation bias.
  • Knowledge Boundary: The model cannot critique information outside its training data or identify subtle factual errors it itself believes to be true.
  • Critique Hallucinations: The model may invent problems that don't exist or propose incorrect corrections, potentially degrading the final answer.
  • Prompt Sensitivity: Performance is highly dependent on the exact phrasing of the critique instruction. Mitigations include prompt ensembling (trying multiple critique prompts) and hybrid approaches that combine self-critique with retrieval-augmented verification for factual grounding.
AGENTIC COGNITIVE ARCHITECTURES

How Self-Critique Works: A Technical Mechanism

Self-Critique is a prompting technique where a language model is instructed to review and evaluate its own initial output or reasoning chain, identifying potential errors, inconsistencies, or areas for improvement before producing a final, refined answer.

The mechanism typically follows a multi-turn prompting sequence. First, the model generates an initial response or Chain-of-Thought (CoT). A subsequent, distinct prompt then instructs the same model to act as a critic or verifier, analyzing the initial output for logical flaws, factual inaccuracies, or missed assumptions. This creates an internal feedback loop, separating the generative and evaluative cognitive modes to reduce confirmation bias and improve output reliability.

Technically, this is implemented by structuring the conversation history. The initial answer becomes context for the critique prompt, which often uses role-playing instructions (e.g., 'You are a meticulous reviewer'). The model's final output synthesizes or is regenerated based on this self-assessment. This process is foundational to agentic architectures like Chain-of-Verification (CoVe) and is a precursor to more automated recursive error correction systems.

CHAIN-OF-THOUGHT REASONING

Frequently Asked Questions

Common questions about Self-Critique, a prompting technique that enhances the reliability of language model outputs by having the model review and refine its own reasoning.

Self-Critique is a prompting technique where a language model is instructed to review and evaluate its own initial output or reasoning chain, identifying potential errors, inconsistencies, or areas for improvement before producing a final, refined answer. It operationalizes a form of metacognition, forcing the model to step back from its initial generation and apply a critical lens. This is typically implemented through a multi-turn prompt structure: the model first generates an answer, then receives an instruction like "Review this answer for logical fallacies, factual inaccuracies, or missed details," and finally produces a revised version. The technique is foundational for building more reliable, auditable, and self-correcting agentic systems without requiring external verification for every step.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.