Red Signals
Posts
How to Prevent Critical Pod Eviction in Kubernetes: Resource Pressure, QoS, Ephemeral Storage 2025

How to Prevent Critical Pod Eviction in Kubernetes: Resource Pressure, QoS, Ephemeral Storage 2025

Stop unexpected pod evictions in Kubernetes. Learn how to handle resource pressure, set proper QoS classes, manage ephemeral storage limits, and use tools like Karpenter to keep your production clusters stable and cost-efficient in 2025.

Ismail Kovvuru
July 24, 2025

When your Kubernetes cluster runs out of resources memory, CPU, or disk , the kubelet kicks in its survival instincts: it evicts pods to protect the node’s health.

In cloud-native production, surprise evictions mean downtime, broken pipelines, and hidden cost explosions.

This guide breaks down:

Exactly how evictions work
The role of QoS classes
What changed in Kubernetes by 2025
How to prevent costly resource pressure
Production-ready YAML and real kubectl commands

What Is Critical Pod Eviction?

Eviction is the kubelet’s last-resort measure to free up resources. If your node’s RAM or disk crosses thresholds, the kubelet forcibly terminates lower-priority pods until resource health is restored.

Eviction is about the node staying healthy not about whether your app wants to stay alive.

Critical Pod Eviction: Quick Reference Table

Aspect	Explanation
What	Eviction means the `kubelet` forcefully removes pods from a node to free up critical resources (memory, disk, PIDs). It’s not a crash — it’s a protective action.
How	The `kubelet` monitors node conditions. When resource usage crosses defined thresholds (e.g., low available memory, disk full, too many PIDs), it picks lower-priority pods for eviction based on their QoS class.
Why	To keep the node itself healthy. If the node runs out of memory, disk, or PIDs, all workloads would fail. Eviction sacrifices the least critical pods to protect the whole system.
When	Happens in real time as soon as a node hits resource pressure thresholds — often triggered by unexpected usage spikes, log file growth, runaway processes, or unbounded ephemeral storage.
Where	On any Kubernetes worker node under resource pressure. Most common in high-density clusters, stateful workloads with heavy disk IO, or legacy apps with poor resource requests.
What To Do	1. Set realistic requests & limits to get higher QoS. 2. Control ephemeral storage with explicit limits. 3. Use HPA/VPA and Karpenter for right-sizing. 4. Monitor MemoryPressure, DiskPressure, PIDPressure. 5. Protect critical workloads with PriorityClasses and PDBs. 6. Test failover & node drain scenarios regularly.

See Eviction Events:

kubectl get events --sort-by='.lastTimestamp' --field-selector reason=Evicted

Kubernetes Critical Pod Eviction Workflow

OOMKill vs Eviction: Know the Difference

Many confuse OOMKills and Evictions, but they’re different:

	OOMKill	Eviction
Who triggers?	Linux kernel (`OOM Killer`)	`kubelet`
Why?	Container uses more memory than its `limit`	Node as a whole hits pressure thresholds
Result	Container is killed instantly	Pod is gracefully terminated (or forcefully if needed)
Fix	Adjust `limits`	Adjust node sizing, requests, quotas

How to spot an OOMKill

kubectl describe pod <pod-name>
# Look for 'OOMKilled' in the container status.

How to spot an Eviction

kubectl describe pod <pod-name>
# Status: Failed, Reason: Evicted
# Message: The node had condition: [MemoryPressure].

Understanding Kubernetes QoS Classes

When the node needs to decide which pods live or die, it looks at their QoS class.

QoS Class	When Used	Eviction Priority
Guaranteed	All containers have equal `requests` = `limits` for CPU & memory	Evicted last
Burstable	At least 1 container has `requests` < `limits`	Medium
BestEffort	No `requests` or `limits`	Evicted first

Example: Guaranteed QoS

resources:
  requests:
    cpu: "500m"
    memory: "256Mi"
  limits:
    cpu: "500m"
    memory: "256Mi"

What Triggers Eviction? Node Signals & Conditions

The kubelet watches signals — basically usage vs thresholds.

Signal	Example Default	When It Matters
MemoryPressure	`memory.available < 100Mi`	Node RAM nearly exhausted
DiskPressure	`nodefs.available < 10%`	Disk usage too high
PIDPressure	Too many processes	Process table limit hit

Check node status:

kubectl describe node <node-name>

Look for:

Conditions:
  Type             Status
  ----             ------
  MemoryPressure   True
  DiskPressure     True
  PIDPressure      False

Ephemeral Storage Evictions: The Silent Killer

Ephemeral storage (a.k.a. container scratch space) is one of the top reasons for unexpected pod evictions in real clusters.

How to check ephemeral usage

kubectl describe pod <pod-name> | grep -i ephemeral

Always set ephemeral storage limits:

resources:
  requests:
    ephemeral-storage: "1Gi"
  limits:
    ephemeral-storage: "2Gi"

Monitoring & Early Detection

Smart teams detect node pressure before kubelet does.

Prometheus Rules Example

- alert: NodeMemoryPressure
  expr: kube_node_status_condition{condition="MemoryPressure",status="true"} == 1
  for: 5m
  labels:
    severity: critical

- alert: NodeDiskPressure
  expr: kube_node_status_condition{condition="DiskPressure",status="true"} == 1
  for: 5m

Useful Grafana panels

node allocatable vs used
pod requests vs usage
evicted pods over time

Prevention & Hardening Strategies (2025)

Golden rules:

Always set realistic requests and limits → no BestEffort!
Use PriorityClasses for truly critical pods.
Isolate workloads with taints and tolerations.
Right-size nodes — on AWS EKS, Bottlerocket is a great minimal OS choice.
Use PodDisruptionBudgets (PDBs) to control voluntary evictions.

Priority Class Example

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
globalDefault: false
description: "Run this only for critical workloads."

Taint & Toleration Example

kubectl taint nodes node1 dedicated=critical:NoSchedule

tolerations:
- key: "dedicated"
  operator: "Equal"
  value: "critical"
  effect: "NoSchedule"

Modern Autoscaling & Karpenter (2025)

Horizontal Pod Autoscaler (HPA)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Vertical Pod Autoscaler (VPA)

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind:       Deployment
    name:       api
  updatePolicy:
    updateMode: "Auto"

Karpenter

By 2025, Karpenter is the go-to next-gen node autoscaler for AWS EKS. It’s:

Faster than Cluster Autoscaler.
Ephemeral storage-aware.
GPU-aware.
Optimized for FinOps.

Stateful Workloads: Extra Protection

StatefulSets like DBs can’t just be restarted on a new node. If they get evicted due to disk pressure, you risk data loss.

Best practices:

Use PodDisruptionBudgets to ensure quorum.
Use anti-affinity rules to spread pods across nodes.

spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - my-db
        topologyKey: "kubernetes.io/hostname"

FinOps Impact: Evictions & Hidden Costs

Evictions mean:

Unscheduled pods reschedule → extra API calls → overhead.
Nodes overloaded → poor bin-packing → wasted capacity.
Repeated cold starts → cost spikes for ephemeral storage, IOPS, or GPUs.

FinOps lesson: Proper requests, limits, and autoscaling keep workloads efficient and save dollars.

Production War Story

Incident: A media startup saw their EKS pods evicted every Friday evening. Root cause? Build agents dumping 100s of GBs of logs to ephemeral storage → DiskPressure → critical backend pods evicted instead.

Fix:

Added ephemeral storage limits.
Moved logs to an S3 sidecar.
Enabled Karpenter for smarter scale-up.
Result: Evictions dropped by 90%, infra costs fell 20%.

Troubleshooting Flowchart

Pod evicted? Start here:

1. kubectl describe pod → check Reason.
2.kubectl describe node → check Conditions.
3.Check requests/limits → fix BestEffort pods.
4.Check ephemeral storage → add limits.
5.Tune autoscaling → HPA/VPA/Karpenter.
6.Add Prometheus/Grafana → watch node signals.
7.Protect StatefulSets → PDBs, anti-affinity.

Final YAML Snippets & Commands

List evicted pods

kubectl get pods --all-namespaces --field-selector=status.phase=Failed

Check resource usage

kubectl top nodes
kubectl top pods --all-namespaces

Describe node conditions

kubectl describe node <node-name>

Test autoscaling

kubectl scale deployment <name> --replicas=20

Advanced Signals, Debugging & Future Roadmap

For readers who want to go beyond day-to-day ops and truly master pod eviction mechanics, here are a few advanced layers to understand and watch as clusters grow more demanding in 2025.

TopologyManager & NUMA Awareness

Modern workloads — like HPC, ML training, or GPU-accelerated AI inferencing — push clusters to squeeze the most out of CPUs, memory, and local PCIe or NUMA nodes.

The TopologyManager aligns pod resource placement for better performance:

It coordinates CPU Manager, Device Manager, and Memory Manager.
For NUMA-aware nodes, it ensures that CPU cores, attached memory, and devices like GPUs or FPGAs are aligned on the same physical socket.
Poor alignment causes cross-socket traffic → higher latency → wasted compute → unpredictable pod resource starvation → possible eviction or performance degradation.

Practical impact:
If you see pods getting evicted or throttled under heavy HPC/AI load, check whether your TopologyManager and CPUManager policies are pinned and aligned.

kubeletArguments:
  --topology-manager-policy: "best-effort" | "restricted" | "single-numa-node"

For high-performance, use single-numa-node where possible — but ensure your nodes have the required NUMA layout.

MemoryManager Enhancements (KEP-3575)

With Kubernetes 1.30+, the MemoryManager can reserve huge pages and align memory allocation for specialized workloads.

Why it matters:
In some HPC or ML nodes, improper memory pinning leads to memory fragmentation, page faults, or cross-NUMA access. This can stress the node’s available RAM faster, accidentally triggering MemoryPressure signals → eviction of unrelated pods.

Reference: KEP-3575: MemoryManager Enhancements

NodeLocal DNSCache

A subtle but real cause of PIDPressure is excessive DNS queries:

Large microservice architectures often generate thousands of DNS lookups per second.
Each DNS lookup spawns ephemeral processes.
Heavy burst DNS traffic can push a node over its PID or CPU threshold.

Solution: Deploy NodeLocal DNSCache:

It runs a local CoreDNS instance on each node.
Reduces the DNS query load on kube-dns.
Speeds up lookups and prevents spikes that could indirectly lead to eviction.

kubectl apply -f https://k8s.io/examples/admin/dns/dns-cache.yaml

For large clusters, this is a best practice.

Ephemeral Containers for Live Debugging

When investigating a node under pressure, ephemeral containers can be your best live debug tool:

Unlike normal containers, ephemeral containers can be injected into a running pod.
Useful for exploring file usage, logs, or networking inside pods that are in CrashLoop or Terminating state.
Great for analyzing why ephemeral storage or PIDs are spiking.

Example:

kubectl debug -it <pod-name> --image=busybox --target=<container-name>

You get a shell inside the namespace of the target container without restarting it.

These advanced signals — TopologyManager, MemoryManager, NodeLocal DNSCache, and ephemeral container debugging — won’t matter for every workload.

But for HPC, AI, GPU workloads, or high-density SaaS clusters, they can make the difference between:

Smooth operations, or
Mystery evictions and wasted cloud spend.

Keep an eye on the latest Kubernetes Enhancement Proposals (KEPs) for resource management , they’re the roadmap to what’s next.

🔗 Resources

Conclusion & Next Steps

Kubernetes evictions are not your enemy — they’re your cluster’s emergency brake, designed to keep nodes alive when resources run out. But when evictions catch you off guard, they can quietly break critical apps, ruin SLAs, and drain cloud budgets with hidden rescheduling and cold-start costs.

The fix is not luck — it’s engineering discipline.
By mastering QoS classes, setting realistic resource requests and limits, and enforcing ephemeral storage controls, you decide which workloads survive when resources run thin.

In 2025 and beyond, tools like Karpenter, Bottlerocket, and smarter scheduling policies make right-sizing nodes and automating placement far more efficient — but they only pay off when paired with good FinOps hygiene. Unexpected evictions are expensive; preventing them is cheap compared to the cost of lost trust and wasted infrastructure.

If you take only one action today:
1. Audit your workloads.
2. Check your BestEffort pods.
3. Set storage limits.
4. Deploy autoscaling.
5. Monitor node signals before the kubelet must react for you.

When you combine sound resource governance with modern Kubernetes scheduling, you run leaner, break less often, and spend smarter , all while keeping your clusters resilient under real-world production pressure.

Evictions keep your nodes healthy.
Preparation keeps your business healthy.

More Resources

Kubernetes Pod Eviction: Commands & Usage Reference

Command/YAML	What It Does
`kubectl get events --sort-by='.lastTimestamp' --field-selector reason=Evicted`	List recent events, filtered to show pods evicted by `kubelet`.
`kubectl describe pod <pod-name>`	Inspect pod status, look for `Evicted` reason or `OOMKilled`.
`kubectl describe node <node-name>`	View node conditions like `MemoryPressure`, `DiskPressure`, `PIDPressure`.
`kubectl top nodes`	Show current CPU and memory usage for all nodes.
`kubectl top pods --all-namespaces`	Show resource usage for all pods in all namespaces.
`kubectl taint nodes <node-name> key=value:NoSchedule`	Mark a node with a taint to repel unwanted pods unless they tolerate it.
`kubectl scale deployment <name> --replicas=<number>`	Manually scale a Deployment up or down to test autoscaling behaviors.
`kubectl debug -it <pod-name> --image=busybox --target=<container-name>`	Launch an ephemeral container for live debugging inside a running pod.
HPA YAML	Define a Horizontal Pod Autoscaler that adjusts replicas based on CPU or memory.
VPA YAML	Define a Vertical Pod Autoscaler that automatically adjusts pod requests/limits.
PriorityClass YAML	Create a class that prioritizes critical pods to survive eviction longer.
Tolerations YAML	Allow pods to tolerate node taints (e.g., critical workload isolation).
PodDisruptionBudget YAML	Ensure a minimum number of pods stay available during voluntary disruptions.
Ephemeral Storage Limits YAML	Explicitly limit ephemeral-storage to prevent unbounded disk usage.
`kubectl apply -f https://k8s.io/examples/admin/dns/dns-cache.yaml`	Deploy NodeLocal DNSCache for local DNS resolution to reduce PID load.