Inferensys

Glossary

Semaphore

A semaphore is a synchronization primitive used in concurrent programming to control access to a shared resource by multiple threads or processes, using an internal counter to permit a limited number of concurrent accesses.
Developer demonstrating multi-agent tool use, agent tool selection interface on laptop, casual tech demo moment.
SYNCHRONIZATION PRIMITIVE

What is a Semaphore?

A semaphore is a foundational synchronization mechanism in concurrent programming used to control access to shared resources.

A semaphore is a synchronization variable that maintains a non-negative integer count to control access to a finite pool of shared resources, such as memory buffers, database connections, or hardware units. It provides two atomic operations: wait (or P) decrements the count, potentially blocking the calling thread if the count is zero, and signal (or V) increments the count, potentially unblocking a waiting thread. This mechanism enforces mutual exclusion and limits concurrency to a specified degree, preventing resource exhaustion and race conditions in multi-threaded or multi-process environments.

Semaphores are classified as counting semaphores, which can represent multiple available resources, or binary semaphores, which act as a simple mutex with a count of one. They are a lower-level primitive than mutexes or condition variables and form the basis for implementing higher-level concurrency constructs. In NPU acceleration and parallel computing, semaphores coordinate work between thread blocks or manage access to on-chip shared memory and hardware queues, ensuring deterministic execution and preventing data corruption when multiple processing elements contend for the same accelerator resource.

SYNCHRONIZATION PRIMITIVE

Core Characteristics of Semaphores

A semaphore is a synchronization variable used to control access to a common resource by multiple threads, maintaining a count to permit a specified number of concurrent accesses.

01

The Internal Counter

The core of a semaphore is an integer counter. This value represents the number of available resource units or permits.

  • Positive Value: Indicates the number of threads that can acquire the semaphore without blocking.
  • Zero Value: Means no permits are available; the next thread to call acquire() will block.
  • Negative Value: Not used in classical semaphores; the count is typically non-negative. The blocking behavior for a zero count is managed by the semaphore's wait queue, not by a negative integer.
02

Atomic Operations: Wait (P) and Signal (V)

Semaphores are manipulated via two fundamental, atomic (indivisible) operations, ensuring thread-safe updates to the internal counter.

  • Wait (P/proberen) or acquire(): Decrements the semaphore count. If the count becomes negative, the calling thread is blocked and placed in a wait queue.
  • Signal (V/verhogen) or release(): Increments the semaphore count. If threads are waiting in the queue, one is typically awakened and allowed to proceed. These operations form the basis for all higher-level synchronization patterns built with semaphores.
03

Binary vs. Counting Semaphores

Semaphores are categorized by the range of their internal counter.

  • Binary Semaphore: The counter is restricted to values 0 and 1. It acts as a simple mutex lock, guaranteeing mutual exclusion for a single resource. A thread that locks (acquires) it must be the one to unlock (release) it.
  • Counting Semaphore: The counter can take any non-negative integer value. It is used to control access to a pool of identical resources (e.g., a connection pool of 10 database connections). It does not enforce ownership; any thread can signal a semaphore that another thread acquired.
04

Synchronization vs. Mutual Exclusion

While often used to implement mutexes, semaphores are a more general primitive for synchronization.

  • Mutual Exclusion: A binary semaphore can ensure only one thread enters a critical section at a time.
  • Event Signaling / Condition Synchronization: A semaphore (often initialized to 0) can make one thread wait for an event signaled by another. Thread A waits on the semaphore; Thread B performs work and then signals, allowing A to proceed. This coordinates the order of execution between threads.
  • Resource Pool Management: A counting semaphore directly models the availability of multiple resource instances.
05

The Wait Queue and Scheduling

When a thread executes acquire() on a semaphore with a zero count, it is blocked and placed into an associated wait queue. The operating system scheduler removes it from the ready-to-run state.

  • Queue Policy: The order of thread awakening (FIFO, priority-based) is implementation-defined but is crucial for avoiding starvation.
  • Awakening: A release() operation by another thread increments the count and typically selects one waiting thread from the queue to unblock and resume execution. The awakened thread's acquire() completes, and the semaphore count is often left at zero.
06

Key Distinction from Mutexes and Condition Variables

Understanding how semaphores differ from other primitives is critical for correct usage.

  • Vs. Mutex: A mutex has a notion of ownership; the thread that locks it must unlock it. A semaphore has no ownership; any thread can signal it. This makes semaphores more flexible but also more prone to programming errors.
  • Vs. Condition Variable: A condition variable is always used with a mutex to protect a shared condition (predicate). A semaphore internally manages its own condition (count > 0) and waiting mechanism, making it self-contained but less expressive for complex conditions.
  • Historical Context: Semaphores were defined by Edsger Dijkstra in 1965, predating many modern synchronization constructs and forming their theoretical foundation.
SYNCHRONIZATION PRIMITIVE

How Semaphores Work: The Wait and Signal Operations

A semaphore is a fundamental synchronization primitive used in concurrent programming to control access to shared resources, such as memory regions or hardware units, across multiple threads or processes.

A semaphore is a synchronization variable that maintains an internal integer count, representing the number of available units of a shared resource. The core operations are wait (or P) and signal (or V). The wait() operation decrements the count; if the count becomes negative, the calling thread is blocked. The signal() operation increments the count and may unblock a waiting thread. This mechanism ensures a specified number of threads can access a resource concurrently, preventing race conditions and enforcing mutual exclusion when configured as a binary semaphore (count of 1).

In parallel computing and NPU acceleration, semaphores coordinate access to shared memory buffers, hardware queues, or computational units across multiple thread blocks or streaming multiprocessors. They are lower-level than mutexes, providing more flexible control over resource pools. Correct use prevents deadlock and starvation. Semaphores are implemented using atomic operations to ensure the integrity of the count variable, which is crucial for maintaining memory consistency in multi-core systems.

SYNCHRONIZATION PRIMITIVES

Semaphore vs. Mutex vs. Condition Variable

A comparison of three fundamental concurrency control mechanisms used to coordinate access to shared resources and manage thread execution order in parallel systems.

FeatureSemaphoreMutexCondition Variable

Primary Purpose

Controls access to a pool of identical resources

Enforces exclusive access to a single resource (mutual exclusion)

Enables threads to wait for a specific program condition to become true

Internal State

Non-negative integer counter

Boolean (locked/unlocked) with an owning thread

No internal state; relies on an associated predicate and mutex

Ownership Concept

No ownership; any thread can signal (post) or wait (wait)

Ownership; only the thread that locked it can unlock it

No ownership; used in conjunction with a mutex that provides the lock context

Typical Use Case

Limiting concurrent access (e.g., connection pools, rate limiting)

Protecting a critical section of code or a shared data structure

Implementing producer-consumer queues, thread rendezvous, or waiting for complex states

Synchronization Granularity

Resource-level (N concurrent accesses)

Code/Data-level (1 exclusive access)

Condition-level (wait/signal on predicate changes)

Signaling Mechanism

post() (or signal()) increments counter; can be called by any thread

N/A (unlock() releases ownership but does not signal other threads)

signal() or broadcast() notifies one or all waiting threads; must hold associated mutex

Common Operations

wait() (P), post() (V), try_wait()

lock(), unlock(), try_lock()

wait(mutex), signal(), broadcast()

Risk of Priority Inversion

Possible, depending on implementation

High risk without priority inheritance protocols

Inherits risk from the associated mutex

SYNCHRONIZATION PRIMITIVE

Semaphore Use Cases in AI/ML Systems

A semaphore is a synchronization variable used to control concurrent access to shared resources. In AI/ML systems, semaphores are fundamental for managing parallelism, preventing race conditions, and ensuring deterministic execution across distributed hardware.

01

Controlling Access to Shared Model Weights

During online learning or federated averaging, multiple worker threads or processes may need to read and update a central model. A counting semaphore limits the number of concurrent writers to prevent data corruption. For example, a semaphore with a count of 1 (a binary semaphore) acts as a mutex to ensure only one worker performs a gradient update at a time, maintaining weight consistency.

02

Managing Inference Request Throttling

In high-throughput serving systems (e.g., Triton Inference Server), a semaphore can throttle the number of concurrent inference requests processed by a GPU. This prevents out-of-memory (OOM) errors by ensuring the device's memory capacity is not exceeded by too many simultaneous model executions. The semaphore count is often tuned based on the model's memory footprint and batch size.

03

Orchestrating Data Pipeline Stages

A producer-consumer pattern is common in data loading pipelines. Semaphores coordinate the flow between stages:

  • A full semaphore tracks the number of filled slots in a bounded buffer.
  • An empty semaphore tracks available empty slots. This prevents a fast image augmentation stage from overwhelming a slower training stage, ensuring smooth, deadlock-free data flow and optimal GPU utilization.
04

Synchronizing Distributed Training Steps

In synchronous distributed training (e.g., using Horovod or PyTorch DDP), a barrier is implemented using semaphores to ensure all workers have finished their backward pass before the All-Reduce operation averages gradients. This guarantees all models update from the same synchronized state, which is critical for convergence. A missed synchronization can lead to gradient staleness and training divergence.

05

Limiting Concurrent Hardware Access

When multiple AI jobs share a pool of NPUs or GPUs, a semaphore pool manages access to these scarce hardware resources. A job must acquire a semaphore (representing a device) before execution. This is foundational to cluster schedulers like Kubernetes with device plugins, enabling fair sharing and preventing resource contention that could crash devices or severely degrade performance.

06

Implementing Custom Parallel Patterns

Beyond standard parallelism, semaphores enable complex patterns like a phaser for multi-stage pipelines or a read-write lock favoring concurrent readers. For instance, in a retrieval-augmented generation (RAG) system, a read-write lock (built with semaphores) allows multiple query threads to concurrently read the vector index while a single maintenance thread periodically updates it, maximizing throughput.

SYNCHRONIZATION PRIMITIVES

Frequently Asked Questions

A semaphore is a fundamental synchronization primitive used in parallel computing to control access to shared resources. These questions address its core mechanics, use cases, and relationship to other concurrency concepts.

A semaphore is a synchronization variable that uses an internal counter to control access to a shared resource by multiple threads or processes. It works by maintaining a count that represents the number of available resource units. Threads call wait() (or P) to decrement the count, acquiring a permit; if the count is zero, the calling thread blocks until a permit becomes available. Threads call signal() (or V) to increment the count, releasing a permit and potentially unblocking a waiting thread. This mechanism ensures that no more than a specified number of concurrent accesses occur, preventing resource exhaustion and race conditions.

Key Mechanism:

  • Initialization: The semaphore is created with a non-negative integer value (e.g., sem_init(&sem, 0, N)).
  • Atomic Operations: The wait() and signal() operations are atomic, meaning they complete without interruption, which is crucial for correctness in concurrent environments.
  • Blocking/Non-blocking: A wait() on a zero-count semaphore typically blocks the thread, putting it into a waiting queue managed by the operating system or runtime.
Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.