Requests and Limits, the Silent Cause of Production Outages, and How CEL Can Save You

May 30, 2026 · 10 min read

Table of Contents

Problem Statement
Solution

Problem Statement

Kubernetes will happily run a workload that has no resources.requests and no resources.limits. The pod schedules, the readiness probe goes green, the smoke tests pass, and the team ships. The cluster is now carrying a workload the scheduler treats as effectively free. Zero CPU reserved, zero memory accounted for, no upper bound on what it can consume.

In a quiet dev cluster, nothing visible happens. In production, under real traffic, the same workload becomes the cause of an outage that is hard to attribute and harder to reproduce:

A pod with no memory limit keeps growing until the node runs out of memory. The kubelet invokes node-pressure eviction or the Linux OOM killer fires on whichever process the kernel picks, frequently another tenant's pod and not the offender.
A pod with no CPU request is scheduled as if it needs nothing. The scheduler packs it next to latency-sensitive workloads. Under load it consumes whatever CPU is idle, and the well-behaved neighbour starts missing its SLO.
A pod with no memory request can be scheduled onto a node that has no real headroom. The first GC pause or burst pushes the node into MemoryPressure and the eviction cascade begins.
A pod with a limit but no request has its request silently defaulted to the limit (for that resource). That sounds safe until you realise you've now reserved far more capacity than the workload actually uses, and your bin-packing and autoscaling decisions are wrong.

None of these failure modes show up in pre-prod. Pre-prod has spare capacity, low concurrency, and short-lived test runs. Production has neighbours, sustained load, and the long tail. Requests and limits are the contract that tells the scheduler and the kubelet how to behave when that contract matters, at saturation.

Solution

Treat requests and limits as a required field, not a suggestion. Decide a policy for how requests and limits should relate for each workload class, and then enforce that policy at admission time so a non-compliant pod cannot reach the cluster in the first place. Kubernetes ships everything you need to do this in-tree: ValidatingAdmissionPolicy, ValidatingAdmissionPolicyBinding, and the Common Expression Language (CEL).

The rest of this post covers three things:

What requests and limits actually do, and why both matter.
When limit > request is the right call, and when limit == request is the only safe choice.
How to enforce your chosen policy with CEL, with no webhook, no extra pod, and no network hop.

What Requests and Limits Actually Do

These two fields look similar and are constantly conflated. They control different subsystems.

requests is consumed by the scheduler. It is the amount of CPU and memory the scheduler subtracts from a node's allocatable capacity when deciding whether the pod fits. It is also what the Horizontal Pod Autoscaler uses as the denominator when computing utilisation. Requests do not cap runtime usage.
limits is consumed by the kubelet and the Linux kernel at runtime. CPU limits are enforced by CFS throttling, so the container is paused when it exceeds its share within a scheduling period. Memory limits are enforced by the cgroup OOM killer; exceed the limit and your container is killed, not throttled.

A few consequences follow directly from this:

A pod with no request is invisible to the scheduler's capacity math. The scheduler will overcommit the node.
A pod with no memory limit can grow until the node is out of memory. At that point the kubelet's node-pressure eviction chooses victims based on QoS class and how far each pod exceeds its request, and the offender is often not the one evicted.
The combination of requests and limits also determines the pod's QoS class (Guaranteed, Burstable, BestEffort), which is the primary input to eviction ordering. BestEffort (no requests, no limits) is evicted first. Guaranteed (requests == limits for both CPU and memory) is evicted last.

The diagram below shows three pods sharing a node and what happens when the node hits memory pressure. The scheduler placed them based on the sum of requests (which fits within allocatable), but the sum of limits is overcommitted. When real usage pushes the node over the eviction threshold, the kubelet picks victims in QoS order, starting with BestEffort and any Burstable pod that has burst above its request.

Two things to take away from this picture:

Requests drive scheduling, not runtime. The node accepted these three pods because the sum of requests (3 CPU / 6Gi) fit comfortably. Actual usage (6 CPU / 14Gi) is much higher, because Burstable pods are allowed to exceed their request and BestEffort pods have no request at all.
Eviction order is determined by QoS plus "how far above request". The BestEffort pod goes first because it has no request to protect it. The Burstable pod goes next because it is using far more memory than it reserved. The Guaranteed pod is the last to be touched, which is exactly why latency-critical workloads should run as Guaranteed.

When `limit > request` Is the Right Call

For most steady-state workloads, setting limit > request is the right efficiency play. You reserve the capacity the workload needs to run normally (the request) and you allow it to burst into idle capacity up to the limit. The node can be packed more densely because the sum of requests fits, even though the sum of limits may exceed node capacity. This is the Burstable QoS class.

The trade-off is honest and worth stating: when the node is contended, a Burstable pod that has burst above its request is a candidate for eviction. The kubelet's node-pressure eviction explicitly ranks pods by how far their usage exceeds their request. Burst is a loan, not a grant. If you can tolerate the occasional eviction and restart, and most stateless web and API workloads can, then limit > request gives you meaningfully better cluster utilisation and lower cost.

When `limit == request` Is the Only Safe Choice

There is one workload pattern where limit > request will hurt you: rapidly scaling, latency-sensitive workloads. Think of a pod that goes from idle to fully saturated in seconds: bursty request fan-out, a job that mmaps a large dataset on startup, a stream processor catching up after a lag spike.

The problem is that standard node-pressure eviction is not fast enough to protect you in that regime. Node-pressure eviction is driven by the kubelet's housekeeping loop and by soft/hard eviction thresholds that include grace periods. By the time the kubelet decides to evict, your bursting pod has already pushed the node into memory pressure, the kernel OOM killer has fired, and the victim was chosen by kernel heuristics rather than by Kubernetes' QoS-aware logic. You get a random pod killed on the node, frequently not the one that caused the pressure.

The defence is to give those workloads the Guaranteed QoS class by setting requests == limits for both CPU and memory. The scheduler reserves the worst-case capacity, the kubelet treats the pod as last-to-evict, and the kernel OOM killer prefers other (Burstable, BestEffort) victims when memory does run out. You pay for the reservation in cluster cost; you buy predictability and the lowest possible eviction priority.

A reasonable default policy for a platform team:

Stateless web/API, batch jobs: require both requests and limits, allow limit > request (Burstable).
Latency-critical or rapidly-scaling workloads: require requests == limits for CPU and memory (Guaranteed).
Always forbid: missing requests, missing limits, and BestEffort pods in production namespaces.

Enforcing the Policy with CEL

Once you have a policy, the next question is how to keep non-compliant pods out of the cluster. The traditional answer was a mutating or validating webhook (OPA/Gatekeeper, Kyverno, or a custom admission controller). Those still work, but they add a pod to run, a service to reach, a certificate to rotate, and a network hop on every admission request.

Kubernetes 1.30+ ships ValidatingAdmissionPolicy and ValidatingAdmissionPolicyBinding, which let the API server evaluate a rule in-process using CEL. No webhook. No external pod. No certificate. The rule runs inside the API server itself.

The model uses two resources:

ValidatingAdmissionPolicy defines what to check: the CEL expression, which resource types it applies to, and the error message.
ValidatingAdmissionPolicyBinding defines where to enforce it: which namespaces, and whether violations should Deny, Warn, or Audit.

The policy is dormant until a binding activates it. That separation lets you author a single policy and roll it out independently per namespace (Warn in staging, Deny in production) without editing the rule.

The diagram below shows where the CEL check sits in the admission path and what each outcome means for the cluster.

A CEL Rule That Requires Both Requests and Limits

The following ValidatingAdmissionPolicy rejects any pod whose containers do not have both CPU and memory requests and limits set.

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicy
metadata:
  name: require-requests-and-limits
spec:
  failurePolicy: Fail
  matchConstraints:
    resourceRules:
    - apiGroups: [""]
      apiVersions: ["v1"]
      operations: ["CREATE", "UPDATE"]
      resources: ["pods"]
  validations:
  - expression: >-
      object.spec.containers.all(c,
        has(c.resources) &&
        has(c.resources.requests) && has(c.resources.requests.cpu) && has(c.resources.requests.memory) &&
        has(c.resources.limits)   && has(c.resources.limits.cpu)   && has(c.resources.limits.memory))
    message: "All containers must set CPU and memory for both requests and limits."

A few things worth pointing out in the expression:

object is the incoming Pod. .spec.containers.all(c, ...) requires the condition to hold for every container. Init containers can be checked separately with .spec.initContainers.all(...).
has(...) is the safe way to test for field presence in CEL. Using c.resources.limits.cpu directly on a pod that has no resources block would be a runtime error, and failurePolicy: Fail would then reject the request. That outcome is what you want for a safety rule, but has() gives you a clean, intentional check with a useful error message.
failurePolicy: Fail means "if the expression itself errors, deny". That is the correct default for a guardrail. Use Ignore only for purely advisory checks.

To activate the policy on a single namespace, label the namespace and bind:

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicyBinding
metadata:
  name: require-requests-and-limits-binding
spec:
  policyName: require-requests-and-limits
  validationActions: [Deny]
  matchResources:
    namespaceSelector:
      matchLabels:
        enforce-resources: "true"

kubectl label namespace prod enforce-resources=true

Roll out with validationActions: [Warn] first. Warnings come back to the client (kubectl apply will print them) and are logged, but pods are not rejected. Once the noise goes to zero, switch the binding to [Deny].

A CEL Rule That Requires `requests == limits` for Guaranteed Workloads

For the rapidly-scaling case, you want a stronger rule on a specific set of namespaces (or a specific label selector on the pod). The expression below requires that for every container, the CPU request equals the CPU limit and the memory request equals the memory limit.

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicy
metadata:
  name: require-guaranteed-qos
spec:
  failurePolicy: Fail
  matchConstraints:
    resourceRules:
    - apiGroups: [""]
      apiVersions: ["v1"]
      operations: ["CREATE", "UPDATE"]
      resources: ["pods"]
  validations:
  - expression: >-
      object.spec.containers.all(c,
        has(c.resources.requests) && has(c.resources.limits) &&
        c.resources.requests.cpu    == c.resources.limits.cpu &&
        c.resources.requests.memory == c.resources.limits.memory)
    message: "Containers in this namespace must have requests == limits for CPU and memory (Guaranteed QoS)."

Bind this policy only to the namespaces that host latency-critical or fast-scaling workloads (for example, label them qos-tier=guaranteed and select on that label in the binding). Pods that try to land in those namespaces without matching requests and limits are rejected at admission time, before the scheduler ever sees them.

Why This Is Worth the Twenty Minutes It Takes

Most production stability problems caused by missing requests and limits are invisible until they aren't, and when they surface they look like noisy-neighbour incidents, mysterious OOM kills, or autoscaling that "doesn't work". The root cause is almost always a workload that was admitted without the contract the rest of the system relies on.

A CEL ValidatingAdmissionPolicy is a single YAML file. It runs in the API server with no extra moving parts. It rejects the bad pod at kubectl apply time with a clear message, so the developer fixes it once instead of the on-call engineer debugging it at 3am. For the cost of one policy and one binding per cluster, you eliminate an entire class of production incident.

Simple validation, applied early, is one of the highest-leverage things a platform team can do.

Problem Statement

Solution

What Requests and Limits Actually Do

When limit > request Is the Right Call

When limit == request Is the Only Safe Choice