Statistical Process Control (SPC) is a method of quality control that uses statistical techniques, primarily control charts, to monitor, control, and improve a process by distinguishing between common-cause variation (inherent noise) and special-cause variation (anomalous signals). In LLM performance monitoring, SPC is applied to metrics like latency, throughput, and output quality scores to detect deviations from stable, predictable behavior, enabling proactive issue identification before user impact occurs.
Glossary
Statistical Process Control (SPC)

What is Statistical Process Control (SPC)?
Statistical Process Control (SPC) is a foundational quality control methodology applied in LLM operations to ensure stable, predictable model behavior by monitoring performance metrics.
The core mechanism involves establishing a baseline of normal process behavior from historical data to calculate control limits. Real-time data points are then plotted on a control chart; points falling outside the limits or forming non-random patterns signal a potential special-cause variation requiring investigation. This provides a quantitative, data-driven framework for anomaly detection and root cause analysis, moving LLM operations from reactive firefighting to proactive, statistical governance of model performance.
Core Components of SPC for LLMs
Statistical Process Control (SPC) provides a rigorous, data-driven framework for monitoring LLM performance. These core components translate manufacturing quality principles into the observability of generative AI systems.
Control Charts
A control chart is a time-series graph with a central line representing the process mean and upper/lower control limits (UCL/LCL) calculated from historical data. It is the primary tool for distinguishing common-cause variation (inherent to the process) from special-cause variation (indicating an anomaly). For LLMs, key metrics plotted include:
- Tokens per Second (TPS) to monitor throughput stability.
- P99 Latency to detect tail latency degradation.
- Output Embedding Cosine Similarity to track semantic drift from a golden dataset baseline. Control limits are typically set at ±3 standard deviations from the mean, establishing statistical boundaries for expected performance.
Process Capability Analysis
Process capability quantifies how well an LLM's performance metrics meet specified requirements or Service Level Objectives (SLOs). It is expressed through indices like Cp and Cpk. Cp measures the potential capability by comparing the width of the specification limits to the natural process variation (6σ). Cpk also accounts for how centered the process is within those limits. For example, an SLO stating "P95 latency must be < 2 seconds" sets the specification limit. A Cpk < 1.0 indicates the LLM's inherent latency variation is too high to reliably meet the SLO, signaling a need for inference optimization or hardware upgrade.
Common-Cause vs. Special-Cause Variation
The fundamental premise of SPC is categorizing variation. Common-cause variation is inherent, random noise within a stable system (e.g., normal fluctuation in LLM token generation time due to minor queueing differences). Special-cause variation is non-random, assignable to a specific root cause (e.g., a sudden latency spike from a downstream API failure or a drop in output quality due to a corrupted context window). SPC's goal is to avoid overreacting to common-cause variation while swiftly detecting and investigating special-cause signals. Applying a fix to common-cause variation often increases overall variability, a mistake known as tampering.
Run Rules (Western Electric Rules)
Run rules are heuristic patterns applied to control charts to detect statistical anomalies beyond a single point outside the control limits. These rules increase sensitivity to process shifts. Key rules for LLM monitoring include:
- Rule 1: A single point beyond the 3σ control limit.
- Rule 2: Nine consecutive points on the same side of the center line (indicating a mean shift).
- Rule 3: Six consecutive points steadily increasing or decreasing (indicating a trend).
- Rule 4: Fourteen consecutive points alternating up and down (indicating systematic oscillation, perhaps from a faulty load balancer). Violating these rules triggers an alert for root cause analysis.
Stratification
Stratification is the technique of breaking down aggregate performance data into meaningful subgroups or cohorts to identify hidden patterns. Analyzing LLM metrics in aggregate can mask issues specific to a subset of traffic. Effective stratification dimensions include:
- Model Version or Fine-Tune: Comparing SPC charts for v1.2 vs. v1.3.
- Request Cohort: Segmenting by user tier, geographic region, or input complexity.
- Hardware Profile: Separating metrics by GPU instance type (e.g., A100 vs. H100).
- Prompt Template: Tracking performance for different classes of instructions (e.g., summarization vs. code generation). Stratification often reveals that a global control chart is in control, while a key subgroup's chart shows special-cause variation.
The Feedback Loop & Corrective Action
SPC is not merely a monitoring system but a closed-loop management process. When special-cause variation is detected, a structured corrective action cycle is initiated:
- Containment: Immediate mitigation (e.g., traffic shift, model rollback).
- Root Cause Analysis (RCA): Using tools like 5 Whys or fishbone diagrams to identify the underlying fault (e.g., a memory leak in the inference server, embedding drift in the RAG index).
- Corrective Action: Implementing a permanent fix (e.g., patching code, retraining on new data).
- Verification: Monitoring the post-fix control chart to confirm the process returns to a state of statistical control. This loop turns monitoring data into continuous system improvement.
SPC vs. Traditional Threshold-Based Monitoring
A comparison of Statistical Process Control (SPC) and traditional threshold-based monitoring for LLM performance metrics, highlighting their fundamental differences in detecting process instability.
| Feature | Statistical Process Control (SPC) | Traditional Threshold-Based Monitoring |
|---|---|---|
Core Philosophy | Monitors for statistical signals of process instability and special-cause variation. | Triggers alerts when a metric crosses a static, pre-defined threshold. |
Detection Capability | Detects subtle shifts, trends, and instability before a metric breaches an operational limit. | Detects only overt violations after a metric has already exceeded a limit. |
Alert Sensitivity | Proactive; alerts on the pattern of data indicating an emerging problem. | Reactive; alerts only when a problem has already manifested. |
False Positive Rate | Lower when properly configured with control limits based on process capability. | Often higher due to threshold arbitrariness and ignoring normal process variation. |
Data Foundation | Requires historical data to calculate control limits (mean, standard deviation). | Can be implemented with minimal historical context using arbitrary or best-guess limits. |
Handling of Variation | Explicitly distinguishes between common-cause (inherent) and special-cause (assignable) variation. | Treats all variation beyond the threshold as equally significant. |
Root Cause Guidance | Control charts (e.g., X-bar, I-MR) suggest the nature of the assignable cause (shift, trend, cycle). | Provides no diagnostic information about the cause of the breach. |
Adaptability to Process | Control limits are recalculated as the process improves, reflecting new capability. | Static thresholds require manual review and adjustment to avoid alert fatigue. |
Key LLM Metrics Monitored with SPC
Statistical Process Control (SPC) applies control charts to monitor the stability of critical LLM performance metrics. These charts distinguish common-cause variation from special-cause anomalies, enabling proactive quality management.
Latency Percentiles (P50, P90, P99)
SPC tracks the distribution of request response times. Control limits are established for key percentiles:
- P50 (Median): The central tendency of latency.
- P90 & P99: Tail latency, critical for user experience. A point exceeding the upper control limit (UCL) on a P99 chart signals a special-cause event, such as a hardware failure or a pathological input sequence causing unusually long generation times. Trending toward a limit may indicate gradual system degradation.
Tokens per Second (TPS)
This throughput metric measures the generative efficiency of the LLM system. An SPC individuals and moving range (I-MR) chart is commonly used to monitor TPS. A sustained drop below the lower control limit indicates a systemic issue reducing throughput, such as:
- GPU thermal throttling.
- Inefficient batching.
- Increased contention for shared resources (e.g., KV cache memory). Stable, in-control TPS is essential for predictable infrastructure costing and scaling.
Output Quality Scores
Automated evaluation metrics are charted to detect drift in output characteristics. Examples include:
- Perplexity (for fluency).
- Embedding cosine similarity against a golden dataset.
- Hallucination score from a dedicated detector. A run of points on one side of the centerline (a "shift") can signal concept drift where the model's relationship to the task is changing, or output drift where the model's internal representations are degrading. This often triggers a model retraining or prompt engineering review.
Error Rate & User Feedback
SPC monitors the proportion of failed requests (HTTP 5xx, model generation errors) and negative user feedback signals (e.g., "thumbs-down" ratings). A p-chart (for proportion defective) is used for these attribute data. A point above the UCL on an error rate chart is an immediate alert for a breaking bug or service outage. A rising trend in negative feedback, even within control limits, can provide early warning of declining relevance or safety issues before they cause a major incident.
Resource Utilization
Hardware efficiency metrics are monitored to ensure cost-effective operation and detect anomalies. Key charts track:
- GPU Utilization %: Sudden drops can indicate software faults; sustained high levels near the UCL may signal the need for scaling.
- Memory Usage (VRAM): Critical for detecting memory leaks, especially in the KV Cache.
- Power Draw (Watts): Unusual patterns can indicate failing hardware or inefficient kernel operations. Correlating resource charts with performance charts (latency, TPS) is key for root cause analysis.
Input/Output Characteristics
SPC monitors the statistical properties of the data flowing through the system to detect shifts that could affect performance.
- Mean Input Token Length: A significant increase can cascade into longer Time to First Token (TTFT) and higher memory pressure.
- Output Token Length Distribution: Drift here can affect overall system latency and cost-per-request calculations.
- Embedding Drift for retrieval-augmented generation (RAG) inputs. Monitoring these with control charts helps distinguish between a change in user behavior (a new use case) and a problem with input preprocessing or data ingestion pipelines.
Frequently Asked Questions
Statistical Process Control (SPC) is a method of quality control that uses statistical methods to monitor and control a process. In LLM operations, SPC is applied to detect anomalies in performance metrics and ensure stable, predictable model behavior.
Statistical Process Control (SPC) is a method of quality control that uses statistical techniques, primarily control charts, to monitor and control a process, distinguishing between common-cause variation (inherent to the system) and special-cause variation (due to an assignable source). In LLM performance monitoring, SPC is applied to track key operational metrics—such as latency percentiles (P99), tokens per second (TPS), error rates, and output quality scores—over time. By establishing statistical baselines and control limits, engineers can detect anomalies, output drift, or performance degradation that signal issues like infrastructure failures, model regression, or concept drift in user inputs, enabling proactive intervention before service level objectives (SLOs) are violated.
Enabling Efficiency, Speed & Accuracy
Intelligent Analysis, Decision & Execution
We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.
Talk to Us
Search across company data
Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.
Useful when people spend too long searching or get different answers from different systems.

Automate internal workflows
Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.
Useful when repetitive work moves across multiple tools and teams.

Add AI to products and internal tools
Build assistants, guided actions, or decision support into the software your team or customers already use.
Useful when AI needs to be part of the product, not a separate tool.
Related Terms
Statistical Process Control (SPC) is a foundational methodology for quality assurance. In LLM operations, it is applied through specific tools and metrics to ensure stable, predictable model behavior. The following terms are essential for implementing and understanding SPC in this context.
Control Chart
A control chart is the primary graphical tool used in SPC to monitor process behavior over time. It plots a key performance metric (e.g., LLM latency, output score) against time or sequence order, with statistically calculated upper and lower control limits (UCL/LCL) and a center line (often the mean).
- Points within the control limits indicate a process in statistical control (common-cause variation).
- Points outside the limits or forming non-random patterns signal special-cause variation, triggering an investigation for anomalies like model degradation or infrastructure issues.
Service Level Indicator (SLI)
A Service Level Indicator (SLI) is a quantitatively measured aspect of an LLM service's performance that SPC monitors. It is the raw metric plotted on a control chart.
- Examples in LLM Ops:
P99 latency,Tokens per Second (TPS),error rate,output quality score. - SLIs must be measurable, relevant, and a direct indicator of user-perceived service quality. SPC uses SLI data to distinguish normal fluctuation from problematic drift.
Anomaly Detection
Anomaly detection is the automated identification of patterns in data that deviate from expected behavior. SPC provides a statistically rigorous framework for anomaly detection in LLM metrics.
- SPC vs. ML-based detection: SPC uses control limits derived from process history, while machine learning methods may use unsupervised clustering or forecasting.
- In practice, SPC control charts are often integrated with ML-driven anomaly detection systems to reduce false positives and provide explainable alerts based on statistical rules.
Output Drift & Concept Drift
Output drift and concept drift are key phenomena SPC is designed to detect.
- Output Drift: A change in the statistical distribution of the LLM's generated text or embeddings over time. Monitored by tracking metrics like average log-probability or embedding cosine similarity against a golden dataset.
- Concept Drift: A change in the underlying relationship between the model's inputs and the desired outputs in the real world. This degrades task performance (e.g., classification accuracy) even if the model itself is unchanged. SPC charts on task-specific scores can signal concept drift.
Golden Dataset
A golden dataset is a curated, high-quality, and stable set of input-output pairs used as a reference standard for evaluation. It is critical for SPC in LLM monitoring.
- Function: Serves as the baseline for calculating control chart limits for quality metrics (e.g., correctness score, BLEU score).
- Process: The LLM's performance on the golden dataset is sampled regularly. The results are plotted on a control chart; drift indicates a change in the model's behavior relative to the known standard.
Cohort Analysis
Cohort analysis is the practice of segmenting traffic into groups for comparative evaluation. It enhances SPC by enabling more granular monitoring.
- Cohorts can be based on: Model version, user segment, request type (e.g., creative vs. factual), or hardware partition.
- SPC Application: Separate control charts are maintained for key cohorts. This allows detection of issues affecting only a specific segment (e.g., latency degradation for users in a specific region) that might be masked in aggregate metrics.

About the author
Prasad Kumkar
CEO & MD, Inference Systems
Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.
His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.
Partnered with leading AI, data, and software stack.
How We Work
Custom AI workflows for your Business
One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.
01
Review the use case
We understand the task, the users, and where AI can actually help.
Read more02
Pick the right approach
We define what needs search, automation, or product integration.
Read more03
Build the first useful version
We implement the part that proves the value first.
Read more04
Improve from there
We add the checks and visibility needed to keep it useful.
Read moreThe first call is a practical review of your use case and the right next step.
Talk to Us