Compare-and-Swap (CAS) is an atomic hardware instruction that updates a memory location to a new value only if its current value matches an expected value, returning a boolean indicating success or failure. This read-modify-write operation is the cornerstone of non-blocking algorithms, enabling concurrent data structure updates without traditional locks. It is a key primitive for implementing lock-free and wait-free synchronization in systems programming and parallel computing frameworks.
Glossary
Compare-and-Swap (CAS)

What is Compare-and-Swap (CAS)?
A foundational atomic instruction for lock-free and wait-free concurrent programming.
In practice, CAS is used to build atomic counters, lock-free stacks, queues, and hash maps. Its implementation relies on low-level CPU support, such as the cmpxchg instruction on x86. A failed CAS operation, often due to a race condition where another thread modified the value, typically triggers a retry loop. This pattern is fundamental to optimistic concurrency control, where threads proceed assuming no conflict and roll back via CAS failure, contrasting with pessimistic locking mechanisms like mutexes.
Key Characteristics of CAS
Compare-and-Swap (CAS) is a fundamental atomic instruction for lock-free and wait-free concurrent programming. It is the cornerstone for implementing non-blocking data structures and synchronization primitives.
Atomic Read-Modify-Write
A Compare-and-Swap operation performs three actions as a single, indivisible atomic unit:
- Reads the current value from a shared memory location.
- Compares this value to an expected value provided by the calling thread.
- Conditionally writes a new value only if the comparison succeeds.
This atomicity is guaranteed by the hardware (e.g., via a CPU instruction like CMPXCHG on x86) and prevents the classic lost update problem that occurs when interleaved reads and writes from multiple threads corrupt shared state.
Lock-Free Synchronization Foundation
CAS enables lock-free (and often wait-free) algorithms. Unlike a mutex, which suspends (blocks) a thread if the lock is held, a thread using CAS in a loop continually retries the operation without being descheduled by the OS.
Key advantages:
- Avoids deadlock: No holding of locks, eliminating circular wait conditions.
- Improves scalability: Under contention, threads make progress via retries rather than blocking, which is costly with many threads.
- Priority inversion resistance: A high-priority thread is not blocked indefinitely by a low-priority thread holding a lock.
It is the primary primitive for building non-blocking stacks, queues, and reference-counting systems like std::shared_ptr.
The ABA Problem
The ABA problem is a classic challenge when using CAS. It occurs when a location's value changes from A to B and back to A between the time a thread reads it and attempts the CAS. The CAS will succeed, but the state of the shared data structure may have changed invalidly.
Example: A thread reads the head pointer A of a lock-free stack. Another thread pops A, deallocates it, allocates a new node that coincidentally has the same address A, and pushes it. The first thread's CAS (expected=A, new=X) succeeds, corrupting the stack.
Common solutions:
- Tagged pointers: Use a spare bits (a version counter or tag) alongside the pointer. The combined value changes even if the address repeats.
- Hazard pointers: Track which pointers are in use to prevent premature reclamation.
Hardware and Language Support
CAS is implemented directly in CPU instruction sets and exposed through high-level language intrinsics and libraries.
Hardware Instructions:
- x86/x86-64:
CMPXCHG(compare and exchange),CMPXCHG8B,CMPXCHG16B. - ARM:
LDREX/STREX(Load-Exclusive/Store-Exclusive) instruction pair. - RISC-V:
LR.W/SC.W(Load-Reserved/Store-Conditional).
Language & Library APIs:
- C/C++:
std::atomic_compare_exchange_strongand_weakin<atomic>. - Java:
java.util.concurrent.atomic.AtomicInteger.compareAndSet(). - Rust:
AtomicPtr::compare_exchangeinstd::sync::atomic.
These abstractions allow developers to write portable lock-free code without writing assembly.
Memory Ordering Semantics
A CAS operation is not just an atomic update; it also acts as a memory barrier or fence, enforcing ordering constraints on memory operations around it. This is critical for correctness in weak memory models (like those of ARM and PowerPC).
When a CAS succeeds, it typically has release semantics for the write, making prior writes visible to other threads. It also has acquire semantics for the read, ensuring subsequent reads by this thread see the latest data.
Example in C++:
std::atomic<int> val; int expected = val.load(std::memory_order_relaxed); do { // ... compute newVal ... } while (!val.compare_exchange_weak(expected, newVal, std::memory_order_acq_rel, std::memory_order_acquire));
The chosen memory orders (acq_rel for success, acquire for failure) prevent dangerous reordering of instructions around the CAS loop.
Contrast with Related Primitives
CAS is one of several atomic Read-Modify-Write (RMW) operations. Understanding its relation to others clarifies its use cases.
- Fetch-and-Add (FAA): Atomically adds a value and returns the old value. Ideal for uncontended counters. Simpler than CAS for increment/decrement but cannot implement complex conditional updates.
- Load-Linked / Store-Conditional (LL/SC): A more flexible pair of instructions used in ARM/RISC-V.
LLloads a value and monitors the address. A subsequentSCsucceeds only if no other store occurred to that address. This can avoid the ABA problem more naturally but may suffer from spurious failures. - Test-and-Set (TAS): An older primitive that atomically sets a bit and returns its old value. Used to build simple locks (spinlocks) but is less versatile for complex lock-free structures compared to CAS.
CAS provides a powerful balance of flexibility and widespread hardware support, making it the default choice for most non-blocking algorithm designs.
CAS vs. Other Synchronization Primitives
A comparison of atomic Compare-and-Swap with traditional blocking and non-blocking synchronization constructs, highlighting their mechanisms, guarantees, and typical use cases in parallel systems.
| Feature / Mechanism | Compare-and-Swap (CAS) | Mutex (Lock) | Semaphore | Condition Variable |
|---|---|---|---|---|
Synchronization Type | Non-blocking (lock-free/wait-free building block) | Blocking (mutual exclusion) | Blocking (counting/resource control) | Blocking (event notification) |
Primary Guarantee | Atomic read-modify-write of a single memory location. | Mutual exclusion for a critical section. | Controls access for N concurrent threads to a resource pool. | Allows threads to wait for a predicate to become true. |
Progress Guarantee | System-wide progress (lock-free when used correctly). | No progress guarantee if holder is descheduled; can lead to deadlock. | No progress guarantee; can lead to deadlock if improperly used. | No progress guarantee; requires correct signaling to avoid missed wake-ups. |
Typical Use Case | Implementing lock-free data structures (queues, stacks, counters). | Protecting short, simple critical sections with low contention. | Managing a finite pool of resources (e.g., database connections, threads). | Coordinating producer-consumer patterns or complex state changes. |
Contention Performance (High) | Good (spins/retries, but avoids context switches). | Poor (context switches, scheduler involvement). | Poor (context switches, scheduler involvement). | Poor (context switches, scheduler involvement). |
Memory Overhead | Low (single memory location). | Medium (kernel object + associated metadata). | Medium (kernel object + count). | High (requires associated mutex and predicate state). |
Composability Risk | High (requires careful algorithm design to avoid ABA problem). | Medium (risk of deadlock with multiple locks). | Low (simple count-based API). | High (complex to use correctly; prone to subtle bugs). |
Applicable Scope | Fine-grained, single-variable updates. | Coarse-grained, arbitrary code sections. | Resource pool access control. | Complex multi-threaded state coordination. |
Frequently Asked Questions
Compare-and-Swap (CAS) is a foundational atomic operation for lock-free and wait-free concurrent programming. This FAQ addresses its core mechanics, applications, and relationship to other parallelism concepts.
Compare-and-Swap (CAS) is an atomic instruction used in concurrent programming that updates a memory location only if its current value matches an expected value, returning a boolean indicating success. The operation is performed as a single, uninterruptible hardware instruction, preventing race conditions between threads. Its typical signature is CAS(memory_location, expected_value, new_value). The processor reads the value at memory_location, compares it to expected_value, and if they match, atomically writes new_value. It returns true on success; if the values differ (meaning another thread modified the location), it returns false and the calling thread must retry or follow an alternative logic path. This "optimistic concurrency" approach forms the basis for non-blocking data structures.
Enabling Efficiency, Speed & Accuracy
Intelligent Analysis, Decision & Execution
We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.
Talk to Us
Search across company data
Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.
Useful when people spend too long searching or get different answers from different systems.

Automate internal workflows
Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.
Useful when repetitive work moves across multiple tools and teams.

Add AI to products and internal tools
Build assistants, guided actions, or decision support into the software your team or customers already use.
Useful when AI needs to be part of the product, not a separate tool.
Related Terms
Compare-and-Swap (CAS) is a fundamental building block for non-blocking algorithms. These related concepts define the broader landscape of concurrent programming and hardware-level synchronization.
Atomic Operations
Atomic operations are indivisible read-modify-write instructions that complete without interruption from other threads. They are the hardware foundation for CAS and other synchronization primitives.
- Key Property: Guarantees that the operation appears to execute instantaneously from the perspective of other threads.
- Hardware Support: Implemented via CPU instructions like
LOCK CMPXCHG(x86) orLDREX/STREX(ARM). - Common Types: Includes operations like fetch-and-add, swap, and compare-and-swap. They are used to build higher-level constructs like spinlocks and lock-free data structures.
Lock-Free Algorithm
A lock-free algorithm is a non-blocking concurrent algorithm that guarantees system-wide progress. At least one thread will complete its operation in a finite number of steps, regardless of the state or speed of other threads.
- Progress Guarantee: Prevents deadlock and priority inversion. Threads may starve individually, but the system as a whole advances.
- Implementation Basis: Typically built using atomic operations like CAS to manage state transitions.
- Example: A lock-free queue uses CAS to atomically update the head or tail pointer, allowing concurrent enqueue and dequeue operations without a global mutex.
Memory Barrier (Fence)
A memory barrier or memory fence is a type of instruction that enforces ordering constraints on memory operations issued before and after the barrier. It is crucial for correct synchronization in systems with weak memory consistency models.
- Function: Prevents the compiler and CPU from reordering loads and stores across the barrier, ensuring visibility of writes to other threads.
- Relation to CAS: CAS operations typically imply a full memory barrier, ensuring the atomic update is globally visible. Weaker forms (e.g.,
compare_exchange_weakin C++) may allow more reordering for performance. - Types: Includes acquire (subsequent reads cannot move before the barrier), release (prior writes cannot move after the barrier), and full barriers.
ABA Problem
The ABA problem is a classic challenge in lock-free programming where a location's value changes from A to B and back to A between a thread's read and its CAS attempt, causing the CAS to succeed incorrectly.
- Cause: The CAS only checks for value equality, not for whether the value is the same instance of A. This can break algorithms that rely on pointer stability.
- Mitigation Strategies:
- Tagged Pointers: Use a spare bits in the pointer (e.g., a version counter or tag) that is incremented on each update. The CAS checks both the pointer and the tag.
- Hazard Pointers: Track pointers that are in use to prevent reclamation and reuse.
- Impact: A key consideration when designing CAS-based data structures like stacks and queues.
Mutex (Mutual Exclusion)
A mutex is a synchronization primitive that enforces mutual exclusion, allowing only one thread at a time to execute a critical section of code or access a shared resource.
- Blocking Primitive: Threads attempting to acquire a held mutex are put to sleep (blocked), requiring OS context switches.
- Contrast with CAS: Mutexes are a higher-level, blocking construct. CAS is a low-level, non-blocking operation used to implement mutexes (e.g., a spinlock).
- Trade-off: Mutexes simplify programming but can lead to deadlock, priority inversion, and overhead from blocking. CAS-based algorithms are more complex but avoid these issues for fine-grained operations.
Wait-Free Algorithm
A wait-free algorithm is the strongest non-blocking progress guarantee, where every thread is guaranteed to complete its operation in a bounded number of steps, independent of the actions of other threads.
- Stronger than Lock-Free: While lock-free guarantees some thread progresses, wait-free guarantees every thread progresses. This prevents starvation.
- Implementation Complexity: Wait-free algorithms are significantly more complex to design and prove correct than lock-free ones. They often require threads to help complete other threads' operations.
- Relation to CAS: CAS can be used in wait-free constructions, but achieving wait-freedom typically requires more sophisticated coordination, such as helping protocols and careful memory management to bound retry loops.

About the author
Prasad Kumkar
CEO & MD, Inference Systems
Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.
His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.
Partnered with leading AI, data, and software stack.
How We Work
Custom AI workflows for your Business
One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.
01
Review the use case
We understand the task, the users, and where AI can actually help.
Read more02
Pick the right approach
We define what needs search, automation, or product integration.
Read more03
Build the first useful version
We implement the part that proves the value first.
Read more04
Improve from there
We add the checks and visibility needed to keep it useful.
Read moreThe first call is a practical review of your use case and the right next step.
Talk to Us