The Agent HorizontalPodAutoscaler (HPA) is a Kubernetes controller that automatically scales the number of replicas (pods) in an agent deployment or statefulset based on observed CPU utilization, memory consumption, or custom metrics. It operates via a continuous control loop that queries the metrics server (or a custom metrics API) to compare the current metric values against the target thresholds defined in its specification. If the observed metric value exceeds the target, the HPA instructs the deployment controller to increase the number of agent pods. Conversely, if utilization is below the target, it scales the replicas down, optimizing resource usage and cost.
Key Components:
- Metrics Server: Aggregates resource usage data from each node.
- HPA Controller: The core logic that calculates desired replica counts.
- Scale Subresource: The API endpoint the HPA interacts with to modify the
.spec.replicas field of a deployment.
The scaling decision is governed by the formula: desiredReplicas = ceil[currentReplicas * (currentMetricValue / desiredMetricValue)]. The HPA also incorporates configurable stabilization windows and scaling policies to prevent rapid, thrashing scale events.