Paxos is a fault-tolerant consensus protocol that allows a collection of distributed processes to agree on a single value despite partial failures. It operates through a sequence of proposals and votes managed by roles like proposers, acceptors, and learners. The protocol guarantees safety (no two correct processes decide different values) and liveness (a value is eventually chosen) under specific network conditions, forming the theoretical basis for reliable distributed systems.
Glossary
Paxos

What is Paxos?
Paxos is a foundational family of consensus algorithms that enables a distributed system of unreliable nodes to agree on a single value, even when some nodes fail or network messages are lost.
In practice, Paxos variants like Multi-Paxos optimize repeated consensus for a replicated log, which is fundamental for building strongly consistent state machines. While notoriously subtle to implement correctly, its concepts underpin many production systems for distributed coordination and data replication, providing the bedrock for ensuring all nodes in a system observe a consistent, agreed-upon sequence of state changes.
Key Properties of Paxos
Paxos is a family of protocols that solves the consensus problem in asynchronous networks where nodes may fail. Its core properties ensure a distributed system can agree on a single value despite failures.
Safety (Agreement & Validity)
The non-negotiable correctness guarantees of Paxos. Safety ensures that:
- Agreement: No two correct nodes ever decide on different values. Once a value is chosen, it is final.
- Validity: Only a value that was proposed by some node can be chosen. These properties hold even in the presence of fail-stop node failures and message delays, preventing split-brain scenarios and guaranteeing system consistency.
Liveness (Progress)
The guarantee that the protocol will eventually make progress and choose a value, provided certain system conditions are met. In purely asynchronous networks (where messages can be delayed indefinitely), Paxos, like any consensus algorithm, cannot guarantee liveness—this is the FLP impossibility result. In practice, liveness is achieved by:
- Using failure detectors or timeouts.
- Ensuring a quorum of non-faulty nodes can communicate.
- Having a distinguished proposer (leader) to drive progress, a pattern used in derived protocols like Multi-Paxos.
Fault Tolerance
Paxos is designed to tolerate fail-stop (crash) faults. The protocol can make progress as long as a quorum of nodes is alive and can communicate. A quorum is typically a majority of nodes (e.g., floor(N/2) + 1 out of N). This means Paxos can tolerate F failures out of N nodes, where N = 2F + 1. For example, a 5-node cluster can tolerate 2 simultaneous node failures. It is not natively designed for Byzantine faults (malicious behavior), though Byzantine Paxos variants exist.
Roles: Proposers, Acceptors, Learners
Paxos defines three logical roles, which may be colocated on the same physical nodes:
- Proposers: Receive client requests and drive the protocol by proposing values.
- Acceptors: Form the core consensus group. They vote on proposals and store the agreed-upon state. A quorum of acceptors is required for any decision.
- Learners: Learn the chosen value once consensus is reached and act upon it (e.g., execute a state machine command). This role separation provides flexibility in system architecture and allows for optimizations like having multiple learners for scalability.
Two-Phase Protocol Structure
The classic Paxos protocol operates in two distinct phases to ensure safety despite concurrency and failures:
- Phase 1 (Prepare/Promise): A proposer sends a prepare request with a unique, monotonically increasing proposal number to a quorum of acceptors. An acceptor promises not to accept any proposal with a number less than this and replies with the highest-numbered proposal it has already accepted (if any).
- Phase 2 (Accept/Accepted): If the proposer receives promises from a quorum, it sends an accept request for a value. The value must be the one from the highest-numbered proposal reported in the promises, or its own if none were reported. A quorum of acceptors must accept it for the value to be chosen. This structure ensures that only one value can be chosen per instance, even with multiple competing proposers.
Asynchronous Network Model
Paxos is designed for an asynchronous network model, meaning it makes no timing assumptions. Messages can be arbitrarily delayed, duplicated, or delivered out of order (though not corrupted). This makes Paxos highly robust for real-world networks like data centers or WANs. The trade-off is that, as per the FLP result, it cannot guarantee liveness (progress) without additional mechanisms like timeouts. This model is why Paxos provides safety under the worst-case network conditions, a critical property for building reliable distributed systems.
Frequently Asked Questions
Paxos is a foundational family of consensus protocols for fault-tolerant distributed systems. These questions address its core mechanisms, practical applications, and role in modern multi-agent and memory architectures.
Paxos is a family of protocols that enables a distributed system of unreliable nodes to agree on a single value (achieve consensus) despite the possibility of node failures, network delays, and partitions. It works through a series of proposal rounds managed by temporarily elected leader nodes, where a majority (quorum) of nodes must promise to consider and then accept a proposed value for it to be chosen.
The protocol operates in two main phases, which may be repeated:
- Prepare/Promise Phase: A proposer sends a prepare request with a unique, increasing proposal number to a quorum of acceptors. Acceptors promise not to accept any proposal with a number lower than this and reply with the highest-numbered proposal they have already accepted (if any).
- Accept/Accepted Phase: If the proposer receives promises from a quorum, it sends an accept request for a value. This value must be the one from the highest-numbered proposal reported in the promises, or its own if none were reported. If a quorum of acceptors accepts this request, the value is chosen and consensus is achieved.
Enabling Efficiency, Speed & Accuracy
Intelligent Analysis, Decision & Execution
We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.
Talk to Us
Search across company data
Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.
Useful when people spend too long searching or get different answers from different systems.

Automate internal workflows
Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.
Useful when repetitive work moves across multiple tools and teams.

Add AI to products and internal tools
Build assistants, guided actions, or decision support into the software your team or customers already use.
Useful when AI needs to be part of the product, not a separate tool.
Related Terms
Paxos is a foundational protocol for achieving agreement in unreliable distributed systems. These related concepts are essential for architects designing fault-tolerant, multi-agent memory and coordination layers.
Byzantine Fault Tolerance (BFT)
The property of a system to reach correct consensus despite arbitrary (potentially malicious) component failures. While classic Paxos assumes non-Byzantine (crash-stop) faults, Practical Byzantine Fault Tolerance (PBFT) and its variants extend the consensus problem. Key distinctions:
- Adversarial Model: Handles nodes that may lie, delay, or send conflicting messages.
- Complexity: Requires more message exchanges and cryptographic signatures (e.g., digital signatures).
- Use Case: Critical for blockchain networks (e.g., Tendermint) and high-security multi-agent systems where agents cannot be fully trusted.
Two-Phase Commit (2PC)
A protocol for achieving atomicity across distributed nodes in a database transaction, not general consensus. It coordinates participants to either all commit or all abort. Process:
- Prepare Phase: The coordinator asks all participants if they can commit.
- Commit Phase: If all vote 'yes', the coordinator instructs a commit; otherwise, it instructs an abort. Contrast with Paxos: 2PC is a blocking protocol (vulnerable to coordinator failure) and requires a stable coordinator, whereas Paxos is leaderless and fault-tolerant. 2PC is used for ACID transactions across shards, not for agreeing on a sequence of values.
Eventual Consistency
A consistency model where, if no new updates are made, all reads will eventually return the last written value. It does not guarantee immediate uniformity. This is a weaker guarantee than the strong consistency provided by Paxos. Common in:
- DNS: Updates propagate slowly.
- Multi-leader replication: Conflicts are resolved asynchronously. Paxos is used to implement stronger models (like linearizability) for critical state, while eventual consistency is often sufficient for cached data or non-critical agent state in large-scale systems.
Memory Quorum
A fundamental technique in distributed systems where an operation must receive successful responses from a minimum subset of nodes to be considered valid. Paxos uses quorums to ensure safety and liveness despite node failures.
- Write Quorum (W): Number of nodes that must acknowledge a write.
- Read Quorum (R): Number of nodes queried for a read. Setting R + W > N (where N is replication factor) guarantees strong consistency. This is a building block for replicated state machines and distributed locks, which are key for coordinating access to shared agent memory.

About the author
Prasad Kumkar
CEO & MD, Inference Systems
Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.
His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.
Partnered with leading AI, data, and software stack.
How We Work
Custom AI workflows for your Business
One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.
01
Review the use case
We understand the task, the users, and where AI can actually help.
Read more02
Pick the right approach
We define what needs search, automation, or product integration.
Read more03
Build the first useful version
We implement the part that proves the value first.
Read more04
Improve from there
We add the checks and visibility needed to keep it useful.
Read moreThe first call is a practical review of your use case and the right next step.
Talk to Us