Inferensys

Glossary

Cache Coherence

Cache coherence is a property of a multiprocessor system that ensures all caches have a consistent view of shared memory, meaning every read of a location returns the most recently written value.
Developer building agentic RAG system, retrieval pipeline diagram on laptop, technical workspace with notes.
PARALLELISM AND SCHEDULING

What is Cache Coherence?

Cache coherence is a fundamental property of shared-memory multiprocessor systems, including modern NPUs and GPUs, ensuring data consistency across distributed caches.

Cache coherence is a hardware-level protocol that maintains a single, consistent view of shared memory data across all processor caches in a multiprocessor system. It guarantees that any read of a memory location returns the most recently written value, regardless of which processor performed the write. This is essential for correct parallel program execution, preventing threads from operating on stale or conflicting data copies. Protocols like MESI (Modified, Exclusive, Shared, Invalid) use state tracking and inter-processor communication to enforce these rules.

In NPU acceleration, cache coherence protocols manage data shared between multiple processing cores or tiles. Without coherence, parallel kernels could produce non-deterministic, erroneous results. The protocol's overhead—through snooping or directory-based messaging—is a critical design trade-off, impacting performance and scalability. Efficient coherence is vital for algorithms using shared memory for inter-thread communication and is closely related to the system's memory consistency model, which defines the visible ordering of memory operations.

CORE PRINCIPLES

Key Properties of a Coherent System

Cache coherence is a fundamental correctness property in shared-memory multiprocessor systems. It ensures that all processors observe a consistent view of memory, preventing data corruption and logical errors. The following principles define the mechanisms and guarantees required for a system to be considered coherent.

01

Write Propagation

This property guarantees that a write operation to a memory location by one processor will eventually become visible to all other processors. It prevents processors from reading stale data from their private caches after an update has occurred elsewhere in the system. Coherence protocols implement this through invalidation or update messages that are broadcast or sent to sharers of the cache line.

  • Invalidation-based protocols: Mark other copies as invalid, forcing a miss on the next read.
  • Update-based protocols: Propagate the new data value directly to all other caches holding the line.
02

Write Serialization

Also known as write ordering, this property ensures that all processors observe writes to the same memory location in the same sequential order. If two processors write to the same address, the system must define a global order for those writes. All subsequent reads by any processor must reflect that order. This is critical for implementing synchronization primitives like locks and barriers. The coherence protocol, often in conjunction with the memory consistency model, establishes this global order, typically by serializing writes through a single point of coordination, such as the home directory in a directory-based protocol.

03

Coherence States (MSI/MESI/MOESI)

Cache lines are tracked using finite state machines. Common protocols define states like:

  • Modified (M): The cache holds the only valid copy and the data is dirty (different from main memory).
  • Exclusive (E): The cache holds the only valid copy, but it is clean (matches main memory).
  • Shared (S): The cache holds a valid, clean copy, but other caches may also hold it.
  • Invalid (I): The data in the cache line is stale and cannot be used.
  • Owned (O): (MOESI) The cache holds a dirty copy and is responsible for supplying it to other caches, but other caches may hold it in Shared state.

Transitions between these states are triggered by local processor operations (read, write) and coherence messages from other caches or the directory.

04

Snooping vs. Directory-Based Protocols

These are the two primary architectural approaches to implementing coherence.

  • Snooping (Bus-Based): All caches monitor (snoop) a shared broadcast interconnect (e.g., a bus) for transactions. When a write is seen, caches invalidate or update their local copies. This is simple but does not scale well to many cores due to broadcast traffic.
  • Directory-Based: A centralized or distributed directory tracks which caches hold copies of each memory block. On a write, point-to-point messages are sent only to the caches that actually hold the data (the sharers). This scales to large core counts (dozens to hundreds) used in modern servers and NPUs, as it avoids broadcast storms.
05

False Sharing

A major performance pitfall where two unrelated variables reside on the same cache line. If different processors write to these different variables, the coherence protocol treats it as a write to the same address, causing unnecessary invalidation traffic and cache line ping-pong. This severely degrades performance even though the processors are not logically sharing data. Mitigation involves padding data structures or aligning variables to cache line boundaries to ensure they occupy separate lines.

06

Memory Consistency Model Interaction

Cache coherence and memory consistency are separate but related concepts. Coherence defines the behavior of reads and writes to a single memory location. Consistency (e.g., Sequential Consistency, Total Store Order, Release Consistency) defines the observable order of reads and writes to different memory locations across threads.

A system can be coherent but not sequentially consistent. For example, writes to different addresses by one processor may be observed in different orders by other processors, even though each individual location's history is coherent. The coherence protocol must provide the necessary guarantees (like write serialization) that the chosen consistency model relies upon.

PROTOCOL ARCHITECTURES

Cache Coherence Protocol Comparison

A comparison of fundamental hardware-based cache coherence protocols, detailing their operational mechanisms, performance characteristics, and implementation trade-offs for multi-core NPU and CPU systems.

Protocol Feature / MetricSnooping (Bus-Based)Directory-BasedToken-Based

Primary Coordination Mechanism

Broadcast & Snoop on shared bus/interconnect

Point-to-point messages via centralized/distributed directory

Circulation of ownership tokens

Scalability (Core Count)

Poor (typically < 32 cores)

Excellent (100s to 1000s+ cores)

Good (10s to 100s of cores)

Average Latency (Shared Read)

Low (if bus uncontended)

Medium (directory lookup overhead)

Variable (depends on token location)

Bandwidth Consumption

High (broadcasts on every write)

Low (point-to-point invalidations/updates)

Medium (token passing messages)

Write Serialization Enforcement

Bus arbitration provides total order

Directory acts as serialization point

Token possession guarantees serialization

Silicon Area Overhead

Low

Medium to High (directory storage)

Low to Medium (token state per block)

Typical Implementation

MESI, MOESI protocols

AMD Infinity Fabric, Intel QPI

Token Coherence, COMA architectures

Handles False Sharing Efficiently

CACHE COHERENCE

Frequently Asked Questions

Cache coherence is a fundamental property of shared-memory multiprocessor systems, ensuring data consistency across private caches. This FAQ addresses core concepts, protocols, and its critical role in parallel computing and hardware acceleration.

Cache coherence is a property of a shared-memory multiprocessor system that guarantees all processor caches have a consistent view of shared data, meaning every read of a memory location returns the most recently written value to that location. It is necessary because without it, multiple cached copies of the same memory block could hold different values, leading to incorrect program execution, race conditions, and violations of the memory consistency model. This inconsistency arises when one processor writes to its local cache copy, making that copy dirty, while other processors retain stale, clean copies. Coherence protocols actively manage these states to maintain a single-writer or multiple-reader invariant across the system.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.