A semaphore is a synchronization variable that maintains a non-negative integer count to control access to a finite pool of shared resources, such as memory buffers, database connections, or hardware units. It provides two atomic operations: wait (or P) decrements the count, potentially blocking the calling thread if the count is zero, and signal (or V) increments the count, potentially unblocking a waiting thread. This mechanism enforces mutual exclusion and limits concurrency to a specified degree, preventing resource exhaustion and race conditions in multi-threaded or multi-process environments.
Glossary
Semaphore

What is a Semaphore?
A semaphore is a foundational synchronization mechanism in concurrent programming used to control access to shared resources.
Semaphores are classified as counting semaphores, which can represent multiple available resources, or binary semaphores, which act as a simple mutex with a count of one. They are a lower-level primitive than mutexes or condition variables and form the basis for implementing higher-level concurrency constructs. In NPU acceleration and parallel computing, semaphores coordinate work between thread blocks or manage access to on-chip shared memory and hardware queues, ensuring deterministic execution and preventing data corruption when multiple processing elements contend for the same accelerator resource.
Core Characteristics of Semaphores
A semaphore is a synchronization variable used to control access to a common resource by multiple threads, maintaining a count to permit a specified number of concurrent accesses.
The Internal Counter
The core of a semaphore is an integer counter. This value represents the number of available resource units or permits.
- Positive Value: Indicates the number of threads that can acquire the semaphore without blocking.
- Zero Value: Means no permits are available; the next thread to call
acquire()will block. - Negative Value: Not used in classical semaphores; the count is typically non-negative. The blocking behavior for a zero count is managed by the semaphore's wait queue, not by a negative integer.
Atomic Operations: Wait (P) and Signal (V)
Semaphores are manipulated via two fundamental, atomic (indivisible) operations, ensuring thread-safe updates to the internal counter.
- Wait (P/proberen) or acquire(): Decrements the semaphore count. If the count becomes negative, the calling thread is blocked and placed in a wait queue.
- Signal (V/verhogen) or release(): Increments the semaphore count. If threads are waiting in the queue, one is typically awakened and allowed to proceed. These operations form the basis for all higher-level synchronization patterns built with semaphores.
Binary vs. Counting Semaphores
Semaphores are categorized by the range of their internal counter.
- Binary Semaphore: The counter is restricted to values 0 and 1. It acts as a simple mutex lock, guaranteeing mutual exclusion for a single resource. A thread that locks (acquires) it must be the one to unlock (release) it.
- Counting Semaphore: The counter can take any non-negative integer value. It is used to control access to a pool of identical resources (e.g., a connection pool of 10 database connections). It does not enforce ownership; any thread can signal a semaphore that another thread acquired.
Synchronization vs. Mutual Exclusion
While often used to implement mutexes, semaphores are a more general primitive for synchronization.
- Mutual Exclusion: A binary semaphore can ensure only one thread enters a critical section at a time.
- Event Signaling / Condition Synchronization: A semaphore (often initialized to 0) can make one thread wait for an event signaled by another. Thread A waits on the semaphore; Thread B performs work and then signals, allowing A to proceed. This coordinates the order of execution between threads.
- Resource Pool Management: A counting semaphore directly models the availability of multiple resource instances.
The Wait Queue and Scheduling
When a thread executes acquire() on a semaphore with a zero count, it is blocked and placed into an associated wait queue. The operating system scheduler removes it from the ready-to-run state.
- Queue Policy: The order of thread awakening (FIFO, priority-based) is implementation-defined but is crucial for avoiding starvation.
- Awakening: A
release()operation by another thread increments the count and typically selects one waiting thread from the queue to unblock and resume execution. The awakened thread'sacquire()completes, and the semaphore count is often left at zero.
Key Distinction from Mutexes and Condition Variables
Understanding how semaphores differ from other primitives is critical for correct usage.
- Vs. Mutex: A mutex has a notion of ownership; the thread that locks it must unlock it. A semaphore has no ownership; any thread can signal it. This makes semaphores more flexible but also more prone to programming errors.
- Vs. Condition Variable: A condition variable is always used with a mutex to protect a shared condition (predicate). A semaphore internally manages its own condition (count > 0) and waiting mechanism, making it self-contained but less expressive for complex conditions.
- Historical Context: Semaphores were defined by Edsger Dijkstra in 1965, predating many modern synchronization constructs and forming their theoretical foundation.
How Semaphores Work: The Wait and Signal Operations
A semaphore is a fundamental synchronization primitive used in concurrent programming to control access to shared resources, such as memory regions or hardware units, across multiple threads or processes.
A semaphore is a synchronization variable that maintains an internal integer count, representing the number of available units of a shared resource. The core operations are wait (or P) and signal (or V). The wait() operation decrements the count; if the count becomes negative, the calling thread is blocked. The signal() operation increments the count and may unblock a waiting thread. This mechanism ensures a specified number of threads can access a resource concurrently, preventing race conditions and enforcing mutual exclusion when configured as a binary semaphore (count of 1).
In parallel computing and NPU acceleration, semaphores coordinate access to shared memory buffers, hardware queues, or computational units across multiple thread blocks or streaming multiprocessors. They are lower-level than mutexes, providing more flexible control over resource pools. Correct use prevents deadlock and starvation. Semaphores are implemented using atomic operations to ensure the integrity of the count variable, which is crucial for maintaining memory consistency in multi-core systems.
Semaphore vs. Mutex vs. Condition Variable
A comparison of three fundamental concurrency control mechanisms used to coordinate access to shared resources and manage thread execution order in parallel systems.
| Feature | Semaphore | Mutex | Condition Variable |
|---|---|---|---|
Primary Purpose | Controls access to a pool of identical resources | Enforces exclusive access to a single resource (mutual exclusion) | Enables threads to wait for a specific program condition to become true |
Internal State | Non-negative integer counter | Boolean (locked/unlocked) with an owning thread | No internal state; relies on an associated predicate and mutex |
Ownership Concept | No ownership; any thread can signal (post) or wait (wait) | Ownership; only the thread that locked it can unlock it | No ownership; used in conjunction with a mutex that provides the lock context |
Typical Use Case | Limiting concurrent access (e.g., connection pools, rate limiting) | Protecting a critical section of code or a shared data structure | Implementing producer-consumer queues, thread rendezvous, or waiting for complex states |
Synchronization Granularity | Resource-level (N concurrent accesses) | Code/Data-level (1 exclusive access) | Condition-level (wait/signal on predicate changes) |
Signaling Mechanism | post() (or signal()) increments counter; can be called by any thread | N/A (unlock() releases ownership but does not signal other threads) | signal() or broadcast() notifies one or all waiting threads; must hold associated mutex |
Common Operations | wait() (P), post() (V), try_wait() | lock(), unlock(), try_lock() | wait(mutex), signal(), broadcast() |
Risk of Priority Inversion | Possible, depending on implementation | High risk without priority inheritance protocols | Inherits risk from the associated mutex |
Semaphore Use Cases in AI/ML Systems
A semaphore is a synchronization variable used to control concurrent access to shared resources. In AI/ML systems, semaphores are fundamental for managing parallelism, preventing race conditions, and ensuring deterministic execution across distributed hardware.
Controlling Access to Shared Model Weights
During online learning or federated averaging, multiple worker threads or processes may need to read and update a central model. A counting semaphore limits the number of concurrent writers to prevent data corruption. For example, a semaphore with a count of 1 (a binary semaphore) acts as a mutex to ensure only one worker performs a gradient update at a time, maintaining weight consistency.
Managing Inference Request Throttling
In high-throughput serving systems (e.g., Triton Inference Server), a semaphore can throttle the number of concurrent inference requests processed by a GPU. This prevents out-of-memory (OOM) errors by ensuring the device's memory capacity is not exceeded by too many simultaneous model executions. The semaphore count is often tuned based on the model's memory footprint and batch size.
Orchestrating Data Pipeline Stages
A producer-consumer pattern is common in data loading pipelines. Semaphores coordinate the flow between stages:
- A full semaphore tracks the number of filled slots in a bounded buffer.
- An empty semaphore tracks available empty slots. This prevents a fast image augmentation stage from overwhelming a slower training stage, ensuring smooth, deadlock-free data flow and optimal GPU utilization.
Synchronizing Distributed Training Steps
In synchronous distributed training (e.g., using Horovod or PyTorch DDP), a barrier is implemented using semaphores to ensure all workers have finished their backward pass before the All-Reduce operation averages gradients. This guarantees all models update from the same synchronized state, which is critical for convergence. A missed synchronization can lead to gradient staleness and training divergence.
Limiting Concurrent Hardware Access
When multiple AI jobs share a pool of NPUs or GPUs, a semaphore pool manages access to these scarce hardware resources. A job must acquire a semaphore (representing a device) before execution. This is foundational to cluster schedulers like Kubernetes with device plugins, enabling fair sharing and preventing resource contention that could crash devices or severely degrade performance.
Implementing Custom Parallel Patterns
Beyond standard parallelism, semaphores enable complex patterns like a phaser for multi-stage pipelines or a read-write lock favoring concurrent readers. For instance, in a retrieval-augmented generation (RAG) system, a read-write lock (built with semaphores) allows multiple query threads to concurrently read the vector index while a single maintenance thread periodically updates it, maximizing throughput.
Frequently Asked Questions
A semaphore is a fundamental synchronization primitive used in parallel computing to control access to shared resources. These questions address its core mechanics, use cases, and relationship to other concurrency concepts.
A semaphore is a synchronization variable that uses an internal counter to control access to a shared resource by multiple threads or processes. It works by maintaining a count that represents the number of available resource units. Threads call wait() (or P) to decrement the count, acquiring a permit; if the count is zero, the calling thread blocks until a permit becomes available. Threads call signal() (or V) to increment the count, releasing a permit and potentially unblocking a waiting thread. This mechanism ensures that no more than a specified number of concurrent accesses occur, preventing resource exhaustion and race conditions.
Key Mechanism:
- Initialization: The semaphore is created with a non-negative integer value (e.g.,
sem_init(&sem, 0, N)). - Atomic Operations: The
wait()andsignal()operations are atomic, meaning they complete without interruption, which is crucial for correctness in concurrent environments. - Blocking/Non-blocking: A
wait()on a zero-count semaphore typically blocks the thread, putting it into a waiting queue managed by the operating system or runtime.
Enabling Efficiency, Speed & Accuracy
Intelligent Analysis, Decision & Execution
We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.
Talk to Us
Search across company data
Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.
Useful when people spend too long searching or get different answers from different systems.

Automate internal workflows
Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.
Useful when repetitive work moves across multiple tools and teams.

Add AI to products and internal tools
Build assistants, guided actions, or decision support into the software your team or customers already use.
Useful when AI needs to be part of the product, not a separate tool.
Related Terms
Semaphores are one of several fundamental constructs used to coordinate concurrent execution. Understanding related primitives is essential for designing correct and efficient parallel systems.
Mutex (Mutual Exclusion)
A mutex is a synchronization primitive that enforces mutual exclusion, allowing only one thread at a time to access a shared resource or critical section of code. Unlike a semaphore, which can permit multiple concurrent accesses, a mutex is a binary semaphore with ownership semantics, meaning the thread that locks it must be the one to unlock it.
- Key Difference: A mutex is a locking mechanism; a semaphore is a signaling mechanism.
- Use Case: Protecting a shared data structure from concurrent modification.
- Example: In a multi-threaded server, a mutex guards a global configuration object.
Condition Variable
A condition variable is a synchronization primitive that enables threads to wait for a specific predicate (condition) to become true. It is always used in conjunction with a mutex to avoid race conditions. Threads can wait on the condition variable, and other threads can signal or broadcast to wake up waiting threads.
- Mechanism:
wait(),signal(), andbroadcast()operations. - Use Case: Implementing producer-consumer queues or thread pools where threads wait for work.
- Example: A rendering thread waits on a condition variable until a new frame buffer is ready.
Barrier Synchronization
Barrier synchronization is a coordination mechanism that forces all participating threads or processes in a parallel computation to reach a specific point in the code (the barrier) before any can proceed further. It is used to synchronize phases of a parallel algorithm.
- Contrast with Semaphore: A barrier enforces that all N threads meet; a semaphore controls how many of N threads can proceed.
- Use Case: Synchronizing between iterations in a parallel simulation or before a collective data exchange.
- Example: In parallel matrix multiplication, threads compute their tile, then all wait at a barrier before the next phase begins.
Atomic Operations
Atomic operations are indivisible read-modify-write instructions (e.g., Compare-and-Swap, fetch-and-add) that complete without interruption from other threads. They are the foundation for building higher-level synchronization primitives like semaphores and mutexes, and for implementing lock-free algorithms.
- Key Property: Guarantees linearizability for a single memory location.
- Hardware Support: Implemented via CPU instructions (e.g.,
LOCKprefix on x86,LDREX/STREXon ARM). - Example: Incrementing a shared counter safely without using a full lock.
Spinlock
A spinlock is a type of mutex where a thread repeatedly checks (or "spins") in a loop until the lock becomes available. It is a busy-wait synchronization primitive. Spinlocks are efficient for very short critical sections where the cost of a context switch would be higher than the spinning time.
- Trade-off: Low latency vs. wasted CPU cycles.
- Implementation: Often built using atomic operations like Compare-and-Swap.
- Use Case: Protecting a short, frequently accessed data structure in kernel or low-latency user-space code.
Monitor
A monitor is a high-level synchronization construct that encapsulates shared data, the procedures that operate on it, and the synchronization mechanisms (typically a mutex and one or more condition variables) needed to control access. It provides mutual exclusion automatically for its procedures.
- Concept: Combines data, operations, and synchronization into a single module.
- Contrast: A semaphore is a lower-level, decentralized signaling tool; a monitor is a structured, object-oriented concurrency abstraction.
- Example: A thread-safe queue class where
enqueueanddequeuemethods are monitor procedures.

About the author
Prasad Kumkar
CEO & MD, Inference Systems
Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.
His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.
Partnered with leading AI, data, and software stack.
How We Work
Custom AI workflows for your Business
One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.
01
Review the use case
We understand the task, the users, and where AI can actually help.
Read more02
Pick the right approach
We define what needs search, automation, or product integration.
Read more03
Build the first useful version
We implement the part that proves the value first.
Read more04
Improve from there
We add the checks and visibility needed to keep it useful.
Read moreThe first call is a practical review of your use case and the right next step.
Talk to Us