Problem Statement
AKS provides multiple scaling dimensions—horizontal pod scaling, vertical resource adjustment, cluster autoscaling, event-driven scaling, and cross-cluster distribution—each addressing different aspects of workload management. However, understanding when to apply each mechanism, how they interact, and their respective trade-offs remains challenging for platform teams navigating production deployment decisions.
Solution
Understanding the capabilities, limitations, and optimal use cases for each AKS scaling mechanism enables informed architectural decisions aligned with specific workload characteristics and operational requirements. The solution requires evaluating six primary scaling approaches and their strategic combinations.
Quick Decision Framework
- Need application-level autoscaling (based on CPU, memory, or custom metrics)? Use HPA (Horizontal Pod Autoscaler).
- Need event-driven, scale-to-zero behavior? Use KEDA (with HPA integration).
- Need cluster/VM-level autoscaling for pod resource pressure? Use Cluster AutoScaler (CAS) for VMSS-based pools.
- Need right-sized node provisioning with less operational overhead? Use NAP (Node Auto Provisioning) — AKS-managed Karpenter.
- Need to tune pod resource requests automatically? Consider VPA (Vertical Pod Autoscaler) in recommendation mode.
- Scaling across regions or many clusters? Use Azure Kubernetes Fleet Manager.
The AKS Scaling Repository offers comprehensive implementation guides, complete with validated examples and step-by-step instructions. It emphasizes the underlying mechanics and design rationale, enabling technical readers to understand how the solution works—not just how to use it. Reviewing this repository will provide deep insights into the implementation details.
Horizontal Pod Autoscaler (HPA)
HPA automatically adjusts replica counts for Deployments or ReplicaSets based on observed metrics (CPU utilization, memory consumption, or custom application metrics). It operates at the application layer, scaling horizontally to distribute load across multiple pod instances. HPA requires Metrics Server for resource-based scaling or Prometheus adapter for custom metrics. On any brand new AKS cluster, metrics server is available/installed.
Key limitations: Cannot scale to zero (minimum replica typically ≥1), requires reliable metrics pipeline, and depends on sufficient cluster capacity for new pods.
Vertical Pod Autoscaler (VPA)
VPA analyzes historical resource consumption patterns and recommends (or automatically applies) adjustments to pod resource requests and limits. Rather than scaling replica counts, VPA right-sizes individual pods to match actual resource needs, preventing over-allocation waste or under-allocation performance issues. VPA operates in three modes: recommendation-only (safe for production), auto-update (applies changes during pod restarts), or recreate (proactively restarts pods to apply changes). Best used in recommendation mode for production workloads, with changes applied during maintenance windows.
Critical consideration: VPA and HPA should generally not actively control the same pods simultaneously. Use VPA to establish appropriate resource requests while HPA manages replica scaling.
KEDA (Kubernetes Event-Driven Autoscaling)
KEDA extends Kubernetes autoscaling capabilities to event-driven architectures by enabling pod scaling based on external metrics and event sources—message queue depth (Service Bus, RabbitMQ, Kafka), blob storage counts, HTTP request rates, database metrics, and numerous other triggers. KEDA can scale workloads to zero replicas during idle periods (unlike HPA) and activates pods when events arrive. It integrates with HPA, translating external metrics into HPA-compatible scaling signals. Best suited for queue consumers, stream processors, scheduled batch jobs, webhook handlers, and any workload where scaling decisions depend on external system state rather than internal pod metrics.
Cluster AutoScaler (CAS)
Cluster AutoScaler manages node counts for pre-defined VM Scale Set (VMSS) node pools by observing pending pods that cannot be scheduled due to insufficient cluster capacity. When resource pressure is detected, CAS requests additional nodes from Azure, typically provisioning new VMs within few minutes. CAS operates at the infrastructure layer, scaling node pools up when capacity is needed and down when nodes become underutilized. Best suited for existing AKS deployments with predictable VM type requirements and teams comfortable managing multiple specialized node pools for different workload categories.
Key limitations: Requires pre-creating node pools with specific VM sizes, leads to bin-packing inefficiencies
Node Auto Provisioning (NAP) — Managed Karpenter
NAP represents Microsoft's fully managed implementation of the open-source Karpenter project, providing node-level right-sizing where the AKS control plane provisions VMs dynamically based on actual pod requirements. Unlike CAS, NAP eliminates the need to define multiple node pools upfront—instead, platform teams define policies (NodePool and AKSNodeClass CRDs) specifying constraints like VM families, capacity types (spot/on-demand), and resource limits.
Trade-offs: NAP reduces complexity but provides fewer node-level configuration options compared to self-hosted Karpenter.
Azure Kubernetes Fleet Manager
Fleet Manager provides centralized management for multiple AKS clusters through a hub-and-spoke architecture, enabling workload distribution and scaling across clusters, regions, or availability zones. Platform teams define resources once on the Fleet hub cluster and use ClusterResourcePlacement policies to propagate workloads to member clusters. Scaling operations performed at the hub level automatically distribute across selected member clusters based on placement rules (PickAll, PickN, PickFixed). Best suited for multi-region resilience strategies, workloads exceeding single-cluster capacity limits, centralized governance requirements, or organizations managing numerous AKS clusters.
References
AKS Scaling Repository — Comprehensive Guides and Examples
Azure Kubernetes Service Documentation
Kubernetes Autoscaling Documentation
KEDA — Kubernetes Event-Driven Autoscaling
Karpenter Documentation
Azure Karpenter Provider
Azure Kubernetes Fleet Manager