Inferensys

Glossary

Paxos

Paxos is a foundational family of consensus protocols that enables a collection of distributed nodes to reliably agree on a single value, even when some nodes fail or experience network delays.
Enterprise console with connected nodes and monitoring panels for orchestrated systems.
CONSENSUS PROTOCOL

What is Paxos?

Paxos is a foundational family of consensus algorithms that enables a distributed system of unreliable nodes to agree on a single value, even when some nodes fail or network messages are lost.

Paxos is a fault-tolerant consensus protocol that allows a collection of distributed processes to agree on a single value despite partial failures. It operates through a sequence of proposals and votes managed by roles like proposers, acceptors, and learners. The protocol guarantees safety (no two correct processes decide different values) and liveness (a value is eventually chosen) under specific network conditions, forming the theoretical basis for reliable distributed systems.

In practice, Paxos variants like Multi-Paxos optimize repeated consensus for a replicated log, which is fundamental for building strongly consistent state machines. While notoriously subtle to implement correctly, its concepts underpin many production systems for distributed coordination and data replication, providing the bedrock for ensuring all nodes in a system observe a consistent, agreed-upon sequence of state changes.

CONSENSUS PROTOCOL

Key Properties of Paxos

Paxos is a family of protocols that solves the consensus problem in asynchronous networks where nodes may fail. Its core properties ensure a distributed system can agree on a single value despite failures.

01

Safety (Agreement & Validity)

The non-negotiable correctness guarantees of Paxos. Safety ensures that:

  • Agreement: No two correct nodes ever decide on different values. Once a value is chosen, it is final.
  • Validity: Only a value that was proposed by some node can be chosen. These properties hold even in the presence of fail-stop node failures and message delays, preventing split-brain scenarios and guaranteeing system consistency.
02

Liveness (Progress)

The guarantee that the protocol will eventually make progress and choose a value, provided certain system conditions are met. In purely asynchronous networks (where messages can be delayed indefinitely), Paxos, like any consensus algorithm, cannot guarantee liveness—this is the FLP impossibility result. In practice, liveness is achieved by:

  • Using failure detectors or timeouts.
  • Ensuring a quorum of non-faulty nodes can communicate.
  • Having a distinguished proposer (leader) to drive progress, a pattern used in derived protocols like Multi-Paxos.
03

Fault Tolerance

Paxos is designed to tolerate fail-stop (crash) faults. The protocol can make progress as long as a quorum of nodes is alive and can communicate. A quorum is typically a majority of nodes (e.g., floor(N/2) + 1 out of N). This means Paxos can tolerate F failures out of N nodes, where N = 2F + 1. For example, a 5-node cluster can tolerate 2 simultaneous node failures. It is not natively designed for Byzantine faults (malicious behavior), though Byzantine Paxos variants exist.

04

Roles: Proposers, Acceptors, Learners

Paxos defines three logical roles, which may be colocated on the same physical nodes:

  • Proposers: Receive client requests and drive the protocol by proposing values.
  • Acceptors: Form the core consensus group. They vote on proposals and store the agreed-upon state. A quorum of acceptors is required for any decision.
  • Learners: Learn the chosen value once consensus is reached and act upon it (e.g., execute a state machine command). This role separation provides flexibility in system architecture and allows for optimizations like having multiple learners for scalability.
05

Two-Phase Protocol Structure

The classic Paxos protocol operates in two distinct phases to ensure safety despite concurrency and failures:

  • Phase 1 (Prepare/Promise): A proposer sends a prepare request with a unique, monotonically increasing proposal number to a quorum of acceptors. An acceptor promises not to accept any proposal with a number less than this and replies with the highest-numbered proposal it has already accepted (if any).
  • Phase 2 (Accept/Accepted): If the proposer receives promises from a quorum, it sends an accept request for a value. The value must be the one from the highest-numbered proposal reported in the promises, or its own if none were reported. A quorum of acceptors must accept it for the value to be chosen. This structure ensures that only one value can be chosen per instance, even with multiple competing proposers.
06

Asynchronous Network Model

Paxos is designed for an asynchronous network model, meaning it makes no timing assumptions. Messages can be arbitrarily delayed, duplicated, or delivered out of order (though not corrupted). This makes Paxos highly robust for real-world networks like data centers or WANs. The trade-off is that, as per the FLP result, it cannot guarantee liveness (progress) without additional mechanisms like timeouts. This model is why Paxos provides safety under the worst-case network conditions, a critical property for building reliable distributed systems.

PAXOS

Frequently Asked Questions

Paxos is a foundational family of consensus protocols for fault-tolerant distributed systems. These questions address its core mechanisms, practical applications, and role in modern multi-agent and memory architectures.

Paxos is a family of protocols that enables a distributed system of unreliable nodes to agree on a single value (achieve consensus) despite the possibility of node failures, network delays, and partitions. It works through a series of proposal rounds managed by temporarily elected leader nodes, where a majority (quorum) of nodes must promise to consider and then accept a proposed value for it to be chosen.

The protocol operates in two main phases, which may be repeated:

  1. Prepare/Promise Phase: A proposer sends a prepare request with a unique, increasing proposal number to a quorum of acceptors. Acceptors promise not to accept any proposal with a number lower than this and reply with the highest-numbered proposal they have already accepted (if any).
  2. Accept/Accepted Phase: If the proposer receives promises from a quorum, it sends an accept request for a value. This value must be the one from the highest-numbered proposal reported in the promises, or its own if none were reported. If a quorum of acceptors accepts this request, the value is chosen and consensus is achieved.
Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.