Inferensys

Glossary

Compare-and-Swap (CAS)

Compare-and-Swap (CAS) is an atomic operation that updates a memory location only if its current value matches an expected value, returning a boolean indicating success.
Stylish WeWork-like workspace with hot desks and document wall, professional searching through enterprise knowledge base on a mounted ultrawide display, warm industrial pendants overhead.
CONCURRENCY PRIMITIVE

What is Compare-and-Swap (CAS)?

A foundational atomic instruction for lock-free and wait-free concurrent programming.

Compare-and-Swap (CAS) is an atomic hardware instruction that updates a memory location to a new value only if its current value matches an expected value, returning a boolean indicating success or failure. This read-modify-write operation is the cornerstone of non-blocking algorithms, enabling concurrent data structure updates without traditional locks. It is a key primitive for implementing lock-free and wait-free synchronization in systems programming and parallel computing frameworks.

In practice, CAS is used to build atomic counters, lock-free stacks, queues, and hash maps. Its implementation relies on low-level CPU support, such as the cmpxchg instruction on x86. A failed CAS operation, often due to a race condition where another thread modified the value, typically triggers a retry loop. This pattern is fundamental to optimistic concurrency control, where threads proceed assuming no conflict and roll back via CAS failure, contrasting with pessimistic locking mechanisms like mutexes.

ATOMIC OPERATION

Key Characteristics of CAS

Compare-and-Swap (CAS) is a fundamental atomic instruction for lock-free and wait-free concurrent programming. It is the cornerstone for implementing non-blocking data structures and synchronization primitives.

01

Atomic Read-Modify-Write

A Compare-and-Swap operation performs three actions as a single, indivisible atomic unit:

  • Reads the current value from a shared memory location.
  • Compares this value to an expected value provided by the calling thread.
  • Conditionally writes a new value only if the comparison succeeds.

This atomicity is guaranteed by the hardware (e.g., via a CPU instruction like CMPXCHG on x86) and prevents the classic lost update problem that occurs when interleaved reads and writes from multiple threads corrupt shared state.

02

Lock-Free Synchronization Foundation

CAS enables lock-free (and often wait-free) algorithms. Unlike a mutex, which suspends (blocks) a thread if the lock is held, a thread using CAS in a loop continually retries the operation without being descheduled by the OS.

Key advantages:

  • Avoids deadlock: No holding of locks, eliminating circular wait conditions.
  • Improves scalability: Under contention, threads make progress via retries rather than blocking, which is costly with many threads.
  • Priority inversion resistance: A high-priority thread is not blocked indefinitely by a low-priority thread holding a lock.

It is the primary primitive for building non-blocking stacks, queues, and reference-counting systems like std::shared_ptr.

03

The ABA Problem

The ABA problem is a classic challenge when using CAS. It occurs when a location's value changes from A to B and back to A between the time a thread reads it and attempts the CAS. The CAS will succeed, but the state of the shared data structure may have changed invalidly.

Example: A thread reads the head pointer A of a lock-free stack. Another thread pops A, deallocates it, allocates a new node that coincidentally has the same address A, and pushes it. The first thread's CAS (expected=A, new=X) succeeds, corrupting the stack.

Common solutions:

  • Tagged pointers: Use a spare bits (a version counter or tag) alongside the pointer. The combined value changes even if the address repeats.
  • Hazard pointers: Track which pointers are in use to prevent premature reclamation.
04

Hardware and Language Support

CAS is implemented directly in CPU instruction sets and exposed through high-level language intrinsics and libraries.

Hardware Instructions:

  • x86/x86-64: CMPXCHG (compare and exchange), CMPXCHG8B, CMPXCHG16B.
  • ARM: LDREX/STREX (Load-Exclusive/Store-Exclusive) instruction pair.
  • RISC-V: LR.W/SC.W (Load-Reserved/Store-Conditional).

Language & Library APIs:

  • C/C++: std::atomic_compare_exchange_strong and _weak in <atomic>.
  • Java: java.util.concurrent.atomic.AtomicInteger.compareAndSet().
  • Rust: AtomicPtr::compare_exchange in std::sync::atomic.

These abstractions allow developers to write portable lock-free code without writing assembly.

05

Memory Ordering Semantics

A CAS operation is not just an atomic update; it also acts as a memory barrier or fence, enforcing ordering constraints on memory operations around it. This is critical for correctness in weak memory models (like those of ARM and PowerPC).

When a CAS succeeds, it typically has release semantics for the write, making prior writes visible to other threads. It also has acquire semantics for the read, ensuring subsequent reads by this thread see the latest data.

Example in C++: std::atomic<int> val; int expected = val.load(std::memory_order_relaxed); do { // ... compute newVal ... } while (!val.compare_exchange_weak(expected, newVal, std::memory_order_acq_rel, std::memory_order_acquire));

The chosen memory orders (acq_rel for success, acquire for failure) prevent dangerous reordering of instructions around the CAS loop.

06

Contrast with Related Primitives

CAS is one of several atomic Read-Modify-Write (RMW) operations. Understanding its relation to others clarifies its use cases.

  • Fetch-and-Add (FAA): Atomically adds a value and returns the old value. Ideal for uncontended counters. Simpler than CAS for increment/decrement but cannot implement complex conditional updates.
  • Load-Linked / Store-Conditional (LL/SC): A more flexible pair of instructions used in ARM/RISC-V. LL loads a value and monitors the address. A subsequent SC succeeds only if no other store occurred to that address. This can avoid the ABA problem more naturally but may suffer from spurious failures.
  • Test-and-Set (TAS): An older primitive that atomically sets a bit and returns its old value. Used to build simple locks (spinlocks) but is less versatile for complex lock-free structures compared to CAS.

CAS provides a powerful balance of flexibility and widespread hardware support, making it the default choice for most non-blocking algorithm designs.

CONCURRENCY MECHANISMS

CAS vs. Other Synchronization Primitives

A comparison of atomic Compare-and-Swap with traditional blocking and non-blocking synchronization constructs, highlighting their mechanisms, guarantees, and typical use cases in parallel systems.

Feature / MechanismCompare-and-Swap (CAS)Mutex (Lock)SemaphoreCondition Variable

Synchronization Type

Non-blocking (lock-free/wait-free building block)

Blocking (mutual exclusion)

Blocking (counting/resource control)

Blocking (event notification)

Primary Guarantee

Atomic read-modify-write of a single memory location.

Mutual exclusion for a critical section.

Controls access for N concurrent threads to a resource pool.

Allows threads to wait for a predicate to become true.

Progress Guarantee

System-wide progress (lock-free when used correctly).

No progress guarantee if holder is descheduled; can lead to deadlock.

No progress guarantee; can lead to deadlock if improperly used.

No progress guarantee; requires correct signaling to avoid missed wake-ups.

Typical Use Case

Implementing lock-free data structures (queues, stacks, counters).

Protecting short, simple critical sections with low contention.

Managing a finite pool of resources (e.g., database connections, threads).

Coordinating producer-consumer patterns or complex state changes.

Contention Performance (High)

Good (spins/retries, but avoids context switches).

Poor (context switches, scheduler involvement).

Poor (context switches, scheduler involvement).

Poor (context switches, scheduler involvement).

Memory Overhead

Low (single memory location).

Medium (kernel object + associated metadata).

Medium (kernel object + count).

High (requires associated mutex and predicate state).

Composability Risk

High (requires careful algorithm design to avoid ABA problem).

Medium (risk of deadlock with multiple locks).

Low (simple count-based API).

High (complex to use correctly; prone to subtle bugs).

Applicable Scope

Fine-grained, single-variable updates.

Coarse-grained, arbitrary code sections.

Resource pool access control.

Complex multi-threaded state coordination.

COMPARE-AND-SWAP (CAS)

Frequently Asked Questions

Compare-and-Swap (CAS) is a foundational atomic operation for lock-free and wait-free concurrent programming. This FAQ addresses its core mechanics, applications, and relationship to other parallelism concepts.

Compare-and-Swap (CAS) is an atomic instruction used in concurrent programming that updates a memory location only if its current value matches an expected value, returning a boolean indicating success. The operation is performed as a single, uninterruptible hardware instruction, preventing race conditions between threads. Its typical signature is CAS(memory_location, expected_value, new_value). The processor reads the value at memory_location, compares it to expected_value, and if they match, atomically writes new_value. It returns true on success; if the values differ (meaning another thread modified the location), it returns false and the calling thread must retry or follow an alternative logic path. This "optimistic concurrency" approach forms the basis for non-blocking data structures.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.