Paxos is a fault-tolerant consensus algorithm for asynchronous networks, ensuring a distributed system's safety (all correct agents agree on the same value) and liveness (a value is eventually chosen). It operates through a series of proposal rounds managed by proposer, acceptor, and learner roles. A value is "chosen" when a majority of acceptors promise to accept it, providing resilience to partial failures and network partitions. This makes it a cornerstone for building reliable state machine replication and distributed databases.
Glossary
Paxos

What is Paxos?
Paxos is a foundational family of consensus algorithms that enables a group of distributed agents to agree on a single value or sequence of commands, even when some agents fail or network messages are delayed or lost.
Within multi-agent system orchestration, Paxos provides the formal mechanism for conflict resolution where agents must converge on a single decision, such as electing a leader or committing a transaction. Its variants, like Multi-Paxos, optimize for repeated consensus on a log of commands. While complex, its guarantees are critical for systems requiring strong consistency, directly relating to the CAP theorem's trade-offs and forming the basis for later algorithms like Raft, which prioritizes understandability.
Key Features of Paxos
Paxos is a foundational family of consensus algorithms that enables a group of distributed agents to agree on a single value or sequence of values, even in the presence of faults and network delays. Its design is built upon several core concepts that ensure safety and liveness.
Safety and Liveness Guarantees
Paxos provides two fundamental guarantees. Safety ensures that if a value is chosen, it is the only value that can be chosen, preventing contradictory decisions. Liveness ensures that, provided a majority of agents are functioning and can communicate, a value will eventually be chosen. These properties make Paxos a cornerstone for building reliable distributed systems where consistency is non-negotiable.
Majority Quorums
Paxos operates on the principle of majority quorums. For any decision (proposal acceptance or value commitment), a quorum—a majority of the participating agents—must agree. This design ensures progress can be made even if some agents fail or are partitioned from the network. A key insight is that any two quorums must intersect in at least one agent, which is critical for maintaining consistency and preventing conflicting decisions.
Roles: Proposers, Acceptors, and Learners
The algorithm defines three distinct logical roles:
- Proposers: Initiate proposals for a value.
- Acceptors: Form the quorums that vote on and accept proposals.
- Learners: Learn the final chosen value. A single physical agent can play multiple roles. This separation of concerns clarifies the protocol's phases and is a key reason for its analyzability and widespread adoption in systems like distributed databases and configuration stores.
Two-Phase Protocol (Prepare/Promise & Accept/Accepted)
Paxos achieves consensus through a two-phase protocol:
- Prepare/Promise Phase: A proposer sends a Prepare request with a unique, monotonically increasing proposal number. Acceptors promise not to accept any proposal with a lower number and reply with the highest-numbered proposal they have already accepted (if any).
- Accept/Accepted Phase: If the proposer receives promises from a majority quorum, it sends an Accept request for a value (often the value from the highest-numbered proposal reported). If a majority of acceptors then accept this request, the value is chosen.
Leader Optimization (Multi-Paxos)
In its basic form, Paxos can be inefficient for agreeing on a sequence of values (a log). Multi-Paxos is a common optimization where a stable leader is elected to act as the sole proposer for a sequence of consensus instances. This eliminates the prepare phase for most instances after the first, dramatically improving throughput and latency. The leader must still be fault-tolerant and can be replaced if it fails.
Asynchronous Network Model
Paxos is designed for an asynchronous network model, where messages can be arbitrarily delayed, duplicated, or lost, but are not corrupted. It makes no timing assumptions, which is crucial for real-world deployments. This model means Paxos cannot guarantee progress within a bounded time (a consequence of the FLP impossibility result), but it guarantees that if communication stabilizes, a value will be chosen, making it extremely robust.
How Does Paxos Work?
Paxos is a foundational family of consensus algorithms that enables a group of distributed agents to agree on a single value or sequence of commands, even when some agents fail or network messages are delayed or lost.
The core algorithm operates through proposals and promises managed by three agent roles: Proposers, Acceptors, and Learners. A proposer initiates agreement by sending a prepare request with a unique, increasing proposal number to a majority of acceptors. If an acceptor receives a prepare request with a number higher than any it has previously promised, it responds with a promise not to accept older proposals and may include the value of the highest-numbered proposal it has already accepted. This phase establishes a leader and gathers any previously accepted state.
Upon receiving promises from a majority, the proposer sends an accept request containing its proposal number and a value. Crucially, this value must be the one from the highest-numbered proposal reported in the promises, ensuring safety. Acceptors then accept this request if they have not promised to ignore it. Once a majority of acceptors accept the same proposal, the value is chosen and can be reliably learned by all learners. This two-phase structure guarantees that only one value can be chosen, providing fault tolerance in asynchronous networks.
Frequently Asked Questions
Paxos is a foundational family of consensus algorithms for asynchronous networks. These questions address its core mechanics, practical applications, and how it compares to modern alternatives.
Paxos is a family of consensus algorithms that enables a group of distributed agents (or processes) in an asynchronous network to agree on a single value despite the failure of some participants or delayed messages. It works through a series of proposal rounds, each with two key phases: a Prepare/Promise phase and an Accept/Accepted phase. In the first phase, a proposer seeks promises from a majority of acceptors not to accept older proposals. If successful, in the second phase, it proposes a value, which is accepted by a majority, achieving irrevocable consensus. The protocol's safety is guaranteed as long as a majority of agents remain operational, while liveness requires a distinguished leader (or proposer) to make progress.
Enabling Efficiency, Speed & Accuracy
Intelligent Analysis, Decision & Execution
We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.
Talk to Us
Search across company data
Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.
Useful when people spend too long searching or get different answers from different systems.

Automate internal workflows
Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.
Useful when repetitive work moves across multiple tools and teams.

Add AI to products and internal tools
Build assistants, guided actions, or decision support into the software your team or customers already use.
Useful when AI needs to be part of the product, not a separate tool.
Related Terms
Paxos is a foundational consensus algorithm. These related terms define the broader landscape of protocols and mechanisms used to achieve agreement, manage concurrency, and resolve conflicts in distributed multi-agent systems.
Consensus Algorithm
A consensus algorithm is a fault-tolerant distributed protocol that enables a group of agents or nodes to agree on a single data value or sequence of actions despite the failure of some participants. It is the core mechanism for achieving state machine replication and linearizable operations in distributed databases and blockchain systems.
- Purpose: Ensures all non-faulty nodes in a network agree on the same sequence of commands.
- Key Properties: Must guarantee safety (nothing bad happens, e.g., no two nodes decide different values) and liveness (something good eventually happens, e.g., a value is decided).
- Examples: Paxos, Raft, and Byzantine Fault Tolerance (BFT) variants like PBFT.
Raft
Raft is a consensus algorithm designed explicitly for understandability, created as a more accessible alternative to Paxos. It elects a single leader node responsible for managing log replication to follower nodes, ensuring agreement across a distributed cluster.
- Leader-Based: Simplifies management by centralizing coordination through an elected leader.
- Log-Centric: All changes go through the leader's log, which is replicated to followers for consistency.
- Understandability: Its decomposition into leader election, log replication, and safety makes it easier to teach and implement correctly than classical Paxos.
- Use Case: The consensus engine for systems like etcd and Consul.
Byzantine Fault Tolerance (BFT)
Byzantine Fault Tolerance (BFT) is the property of a consensus system to reach agreement correctly even when some agents fail arbitrarily or behave maliciously (known as Byzantine failures). This is a stricter requirement than handling simple crashes.
- Threat Model: Assumes nodes can send conflicting, incorrect, or manipulated messages.
- Requirements: Typically requires a higher number of replicas (e.g., 3f+1 to tolerate f faulty nodes) compared to crash-fault-tolerant protocols like Paxos.
- Practical Implementation: Practical Byzantine Fault Tolerance (PBFT) is a seminal algorithm that uses a three-phase protocol with a primary node and backups to achieve consensus despite malicious actors.
- Application: Critical for blockchain networks (e.g., permissioned chains) and high-security financial systems.
Two-Phase Commit (2PC)
Two-Phase Commit (2PC) is a distributed consensus protocol that ensures atomicity across multiple participating agents or databases. A central coordinator orchestrates the process to guarantee all participants either commit or abort a transaction.
- Phase 1 (Prepare): The coordinator asks all participants if they can commit. Participants vote "Yes" or "No."
- Phase 2 (Commit/Rollback): If all vote "Yes," the coordinator sends a commit command. If any vote "No," it sends an abort command.
- Blocking Nature: It is a blocking protocol; if the coordinator fails, participants may be left in an uncertain state, requiring manual intervention.
- Contrast with Paxos: 2PC is a transaction protocol for atomic commitment, while Paxos is a generic consensus protocol for agreeing on a sequence of values, often used to build more robust coordinators.
Vector Clock
A vector clock is a logical timestamp mechanism used in distributed systems to capture causal relationships between events. It enables the detection of concurrent updates, which is essential for conflict resolution in eventually consistent systems.
- Mechanism: Each node maintains a vector (array) of counters, one for each node in the system. On an event, a node increments its own counter.
- Causality Detection: By comparing vectors, a system can determine if one event happened-before another, or if they are concurrent.
- Conflict Resolution: When concurrent writes are detected (neither vector is strictly greater), a conflict resolution policy (like "last writer wins" or application-specific merge) must be applied.
- Relation to Paxos: Paxos provides strong consistency, preventing conflicts. Vector clocks are used in systems that prioritize availability and partition tolerance, accepting conflicts that must be resolved later.
Conflict-Free Replicated Data Type (CRDT)
A Conflict-Free Replicated Data Type (CRDT) is a data structure designed for distributed systems that can be replicated across multiple agents, updated concurrently without coordination, and mathematically guarantee eventual consistency.
- Core Principle: Operations are designed to be commutative, associative, and idempotent, so they can be applied in any order to reach the same state.
- Types: Operation-based (CmRDT) requires reliable broadcast of operations. State-based (CvRDT) merges states using a monotonic join semilattice.
- Use Case: Ideal for collaborative applications (like real-time editors), decentralized databases, and scenarios where low-latency writes and high availability are critical.
- Contrast with Paxos: CRDTs avoid consensus for writes, embracing eventual consistency. Paxos is used to achieve strong, immediate consistency, often at the cost of requiring coordination and majority agreement.

About the author
Prasad Kumkar
CEO & MD, Inference Systems
Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.
His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.
Partnered with leading AI, data, and software stack.
How We Work
Custom AI workflows for your Business
One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.
01
Review the use case
We understand the task, the users, and where AI can actually help.
Read more02
Pick the right approach
We define what needs search, automation, or product integration.
Read more03
Build the first useful version
We implement the part that proves the value first.
Read more04
Improve from there
We add the checks and visibility needed to keep it useful.
Read moreThe first call is a practical review of your use case and the right next step.
Talk to Us