Inferensys

Glossary

CAP Theorem

The CAP theorem is a fundamental principle in distributed computing stating that a distributed data store can provide only two of three guarantees: Consistency, Availability, and Partition tolerance.
Data engineer managing feature store on laptop, feature definitions visible, casual data engineering session.
DISTRIBUTED SYSTEMS

What is the CAP Theorem?

The CAP theorem is a fundamental principle in distributed computing that defines the inherent trade-offs in designing networked data stores.

The CAP theorem (Consistency, Availability, Partition tolerance) is a foundational principle in distributed systems stating that a networked data store can only simultaneously guarantee two of three properties: Consistency (all nodes see the same data at the same time), Availability (every request receives a response), and Partition tolerance (the system continues operating despite network failures). It is a trade-off law, not a choice, as partition tolerance is unavoidable in real-world networks, forcing a design choice between consistency and availability during a partition.

In practice, the theorem guides system architecture and database selection. A CP system (Consistency, Partition tolerance) prioritizes data correctness over uptime, sacrificing availability to maintain consistency across nodes. An AP system (Availability, Partition tolerance) ensures the system remains responsive, potentially serving stale data, to maintain availability. Modern databases often provide configurable consistency levels and tunable latencies, allowing engineers to adjust the trade-off dynamically based on specific application requirements and failure scenarios.

CAP THEOREM

The Three Guarantees of CAP

The CAP theorem, proposed by computer scientist Eric Brewer, is a fundamental principle in distributed systems theory. It posits that a distributed data store can simultaneously provide only two of three critical guarantees when a network partition occurs.

01

Consistency (C)

Consistency means that every read operation receives the most recent write or an error. In a consistent system, all nodes see the same data at the same time. This is often referred to as linearizability or atomic consistency.

  • Mechanism: Achieved through coordination protocols like two-phase commit (2PC) or consensus algorithms like Raft and Paxos.
  • Trade-off: Guaranteeing consistency often requires nodes to block or delay responses until data is synchronized, potentially impacting availability.
  • Example: A financial database ensuring that a balance transfer is immediately reflected across all replicas before confirming the transaction to the user.
02

Availability (A)

Availability means that every request (read or write) receives a (non-error) response, without guarantee that it contains the most recent write. The system remains operational for both reads and writes even during failures.

  • Mechanism: Achieved by allowing nodes to operate independently, often serving potentially stale data from local replicas.
  • Trade-off: High availability can lead to eventual consistency, where different nodes may return different data for a short period.
  • Example: A social media timeline that always loads quickly, even if it occasionally shows a slightly outdated post count or like tally during a network issue.
03

Partition Tolerance (P)

Partition tolerance means the system continues to operate despite an arbitrary number of network messages being dropped (or delayed) between nodes. A network partition is a break in communication, splitting the network into isolated subgroups.

  • Fundamental Constraint: In a distributed system, network partitions are a guaranteed eventual failure mode. Therefore, partition tolerance (P) is non-negotiable for any practical wide-area network system.
  • Implication: The true choice in system design is between Consistency and Availability when a partition occurs (CP vs. AP).
  • Example: A globally distributed database that can handle an undersea cable break, allowing regional data centers to continue serving requests independently, even if they cannot communicate with each other.
04

CP Systems (Consistency & Partition Tolerance)

CP systems prioritize consistency over availability during a network partition. If nodes cannot communicate to guarantee consistency, the system will return an error or become unavailable for writes/reads on the affected data.

  • Typical Use Cases: Financial systems (e.g., stock trades, bank account balances), distributed locking services, and metadata coordination where correctness is paramount.
  • Examples: Google Spanner, etcd, ZooKeeper, and traditional relational databases with synchronous replication.
  • Behavior on Partition: A node may refuse a write request if it cannot achieve a quorum to replicate the data consistently, preserving the atomicity of operations.
05

AP Systems (Availability & Partition Tolerance)

AP systems prioritize availability over strict consistency during a network partition. All nodes remain responsive, but they may serve stale or divergent data. The system provides eventual consistency.

  • Typical Use Cases: Social media platforms, e-commerce product catalogs, DNS, and real-time collaborative applications where uninterrupted service is more critical than immediate uniformity.
  • Examples: Amazon DynamoDB, Apache Cassandra, Riak, and CouchDB.
  • Behavior on Partition: Nodes in each partition continue to accept reads and writes locally. When the partition heals, a conflict resolution mechanism (like last-write-wins or application-defined logic) merges the divergent data states.
06

CA Systems (Theoretical & Localized)

A CA system provides both Consistency and Availability, but only in the absence of network partitions. This is effectively a single-node or tightly-coupled cluster system where partition tolerance is not a design consideration.

  • Reality Check: In a true distributed system across multiple failure domains (e.g., different data centers), network partitions are inevitable. Therefore, a CA distributed system is a practical impossibility.
  • Localized Examples: A single PostgreSQL database server, or a Redis cluster running within a single rack with a perfect network. The moment the system is stretched across a wide-area network, it must choose CP or AP behavior for partition scenarios.
  • Misconception: The CAP theorem is often misinterpreted as "choose 2 out of 3 at all times." In reality, it's about the trade-off enforced during a partition.
ARCHITECTURAL PATTERNS

CAP Trade-Offs: System Archetypes

This table compares the primary distributed system design patterns derived from the CAP theorem, detailing their trade-offs, typical use cases, and implementation characteristics.

FeatureCP System (Consistency & Partition Tolerance)AP System (Availability & Partition Tolerance)CA System (Consistency & Availability)

Primary Guarantee Sacrificed

Availability (A)

Consistency (C)

Partition Tolerance (P)

Consistency Model

Strong, Linearizable

Eventual

Strong, Immediate

Availability During Network Partition

Typical Data Store

CP Database (e.g., etcd, ZooKeeper, traditional RDBMS with synchronous replication)

AP Database (e.g., Cassandra, DynamoDB, Riak)

Single-node or tightly-coupled cluster database (e.g., standalone PostgreSQL, MySQL)

Read/Write Latency During Normal Operation

Higher (due to coordination)

Lower (writes often acknowledged locally)

Lowest (no network coordination overhead)

Conflict Resolution

Via consensus (e.g., Paxos, Raft); avoids conflicts.

Via application logic or last-write-wins (LWW); resolves conflicts later.

Not applicable; single source of truth.

System Scale-Out

Challenging; coordination overhead increases with nodes.

Horizontally scalable; nodes operate largely independently.

Vertically scalable (scale-up); horizontal scaling is limited.

Fault Tolerance Model

Tolerates node failures but may become unavailable if quorum is lost.

Tolerates node and network failures; remains operational but may return stale data.

Limited; a node failure can cause full outage unless failover is manual.

Example Use Case

Financial transaction ledger, cluster coordination, system configuration.

Shopping cart, social media feeds, IoT sensor data aggregation.

Legacy monolithic application database, small-scale internal tooling.

CAP THEOREM

Practical Implications and Modern Interpretations

The CAP theorem, a foundational principle in distributed systems, is often misinterpreted as a strict three-way trade-off. Modern engineering practices reveal a more nuanced reality focused on tunable consistency and pragmatic availability.

The CAP theorem is a trade-off constraint, not an absolute law. In practice, engineers design for partition tolerance (P) as a non-negotiable requirement for networked systems, then make deliberate choices between consistency (C) and availability (A). This manifests as selecting a consistency model—like strong, eventual, or causal consistency—tailored to the application's tolerance for stale data. Modern databases often provide configurable knobs, such as write/read consistency levels or quorum settings, allowing developers to tune the C-A balance per operation.

Contemporary interpretations emphasize that the choice is not binary but a spectrum of latency-consistency trade-offs. Techniques like conflict-free replicated data types (CRDTs) and operational transformation enable high availability with strong eventual consistency. Furthermore, the PACELC theorem extends CAP by acknowledging that during normal operation (without partitions), the trade-off is between Latency (L) and Consistency (C). This framework guides the design of globally distributed systems where engineers strategically sacrifice perfect consistency for lower latency and higher availability in most scenarios.

CAP THEOREM

Frequently Asked Questions

The CAP theorem is a fundamental principle in distributed systems theory that defines the inherent trade-offs between three core guarantees. These FAQs address its practical implications for architects designing resilient, self-healing software systems.

The CAP theorem is a fundamental principle in distributed computing which states that a distributed data store can only simultaneously provide two out of the following three guarantees: Consistency, Availability, and Partition tolerance. It establishes that in the presence of a network partition—a failure that prevents communication between nodes—a system designer must choose between maintaining perfect consistency or perfect availability, but cannot have both. This theorem, formalized by Eric Brewer in 2000 and later proven by Seth Gilbert and Nancy Lynch, creates a foundational trade-off that dictates the architecture of modern databases and distributed systems, forcing engineers to prioritize based on their application's specific requirements for data accuracy versus system uptime.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.