- Red Signals
- Posts
- How to Prevent Critical Pod Eviction in Kubernetes: Resource Pressure, QoS, Ephemeral Storage 2025
How to Prevent Critical Pod Eviction in Kubernetes: Resource Pressure, QoS, Ephemeral Storage 2025
Stop unexpected pod evictions in Kubernetes. Learn how to handle resource pressure, set proper QoS classes, manage ephemeral storage limits, and use tools like Karpenter to keep your production clusters stable and cost-efficient in 2025.

When your Kubernetes cluster runs out of resources memory, CPU, or disk , the kubelet
kicks in its survival instincts: it evicts pods to protect the node’s health.
In cloud-native production, surprise evictions mean downtime, broken pipelines, and hidden cost explosions.
This guide breaks down:
Exactly how evictions work
The role of QoS classes
What changed in Kubernetes by 2025
How to prevent costly resource pressure
Production-ready YAML and real
kubectl
commands
What Is Critical Pod Eviction?
Eviction is the kubelet
’s last-resort measure to free up resources. If your node’s RAM or disk crosses thresholds, the kubelet
forcibly terminates lower-priority pods until resource health is restored.
Eviction is about the node staying healthy not about whether your app wants to stay alive.
Critical Pod Eviction: Quick Reference Table
Aspect | Explanation |
---|---|
What | Eviction means the |
How | The |
Why | To keep the node itself healthy. If the node runs out of memory, disk, or PIDs, all workloads would fail. Eviction sacrifices the least critical pods to protect the whole system. |
When | Happens in real time as soon as a node hits resource pressure thresholds — often triggered by unexpected usage spikes, log file growth, runaway processes, or unbounded ephemeral storage. |
Where | On any Kubernetes worker node under resource pressure. Most common in high-density clusters, stateful workloads with heavy disk IO, or legacy apps with poor resource requests. |
What To Do | 1. Set realistic requests & limits to get higher QoS. 2. Control ephemeral storage with explicit limits. 3. Use HPA/VPA and Karpenter for right-sizing. 4. Monitor MemoryPressure, DiskPressure, PIDPressure. 5. Protect critical workloads with PriorityClasses and PDBs. 6. Test failover & node drain scenarios regularly. |
See Eviction Events:
kubectl get events --sort-by='.lastTimestamp' --field-selector reason=Evicted

OOMKill vs Eviction: Know the Difference
Many confuse OOMKills and Evictions, but they’re different:
OOMKill | Eviction | |
---|---|---|
Who triggers? | Linux kernel ( |
|
Why? | Container uses more memory than its | Node as a whole hits pressure thresholds |
Result | Container is killed instantly | Pod is gracefully terminated (or forcefully if needed) |
Fix | Adjust | Adjust node sizing, requests, quotas |
How to spot an OOMKill
kubectl describe pod <pod-name>
# Look for 'OOMKilled' in the container status.
How to spot an Eviction
kubectl describe pod <pod-name>
# Status: Failed, Reason: Evicted
# Message: The node had condition: [MemoryPressure].
Understanding Kubernetes QoS Classes
When the node needs to decide which pods live or die, it looks at their QoS class.
QoS Class | When Used | Eviction Priority |
---|---|---|
Guaranteed | All containers have equal | Evicted last |
Burstable | At least 1 container has | Medium |
BestEffort | No | Evicted first |
Example: Guaranteed QoS
resources:
requests:
cpu: "500m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "256Mi"
What Triggers Eviction? Node Signals & Conditions
The kubelet
watches signals — basically usage vs thresholds.
Signal | Example Default | When It Matters |
---|---|---|
MemoryPressure |
| Node RAM nearly exhausted |
DiskPressure |
| Disk usage too high |
PIDPressure | Too many processes | Process table limit hit |
Check node status:
kubectl describe node <node-name>
Look for:
Conditions:
Type Status
---- ------
MemoryPressure True
DiskPressure True
PIDPressure False
Ephemeral Storage Evictions: The Silent Killer
Ephemeral storage (a.k.a. container scratch space) is one of the top reasons for unexpected pod evictions in real clusters.
How to check ephemeral usage
kubectl describe pod <pod-name> | grep -i ephemeral
Always set ephemeral storage limits:
resources:
requests:
ephemeral-storage: "1Gi"
limits:
ephemeral-storage: "2Gi"
Monitoring & Early Detection
Smart teams detect node pressure before kubelet
does.
Prometheus Rules Example
- alert: NodeMemoryPressure
expr: kube_node_status_condition{condition="MemoryPressure",status="true"} == 1
for: 5m
labels:
severity: critical
- alert: NodeDiskPressure
expr: kube_node_status_condition{condition="DiskPressure",status="true"} == 1
for: 5m
Useful Grafana panels
node allocatable vs used
pod requests vs usage
evicted pods over time
Prevention & Hardening Strategies (2025)
Golden rules:
Always set realistic
requests
andlimits
→ no BestEffort!Use
PriorityClasses
for truly critical pods.Isolate workloads with
taints
andtolerations
.Right-size nodes — on AWS EKS, Bottlerocket is a great minimal OS choice.
Use
PodDisruptionBudgets
(PDBs) to control voluntary evictions.
Priority Class Example
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
value: 1000000
globalDefault: false
description: "Run this only for critical workloads."
Taint & Toleration Example
kubectl taint nodes node1 dedicated=critical:NoSchedule
tolerations:
- key: "dedicated"
operator: "Equal"
value: "critical"
effect: "NoSchedule"
Modern Autoscaling & Karpenter (2025)
Horizontal Pod Autoscaler (HPA)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Vertical Pod Autoscaler (VPA)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: api
updatePolicy:
updateMode: "Auto"
Karpenter
By 2025, Karpenter is the go-to next-gen node autoscaler for AWS EKS. It’s:
Faster than Cluster Autoscaler.
Ephemeral storage-aware.
GPU-aware.
Optimized for FinOps.
Stateful Workloads: Extra Protection
StatefulSets like DBs can’t just be restarted on a new node. If they get evicted due to disk pressure, you risk data loss.
Best practices:
Use
PodDisruptionBudgets
to ensure quorum.Use
anti-affinity
rules to spread pods across nodes.
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- my-db
topologyKey: "kubernetes.io/hostname"
Evictions mean:
Unscheduled pods reschedule → extra API calls → overhead.
Nodes overloaded → poor bin-packing → wasted capacity.
Repeated cold starts → cost spikes for ephemeral storage, IOPS, or GPUs.
FinOps lesson: Proper requests, limits, and autoscaling keep workloads efficient and save dollars.
Production War Story
Incident: A media startup saw their EKS pods evicted every Friday evening. Root cause? Build agents dumping 100s of GBs of logs to ephemeral storage → DiskPressure
→ critical backend pods evicted instead.
Fix:
Added ephemeral storage limits.
Moved logs to an S3 sidecar.
Enabled Karpenter for smarter scale-up.
Result: Evictions dropped by 90%, infra costs fell 20%.
Troubleshooting Flowchart
Pod evicted? Start here:
1. kubectl describe pod
→ check Reason
.
2.kubectl describe node
→ check Conditions
.
3.Check requests
/limits
→ fix BestEffort pods.
4.Check ephemeral storage → add limits.
5.Tune autoscaling → HPA/VPA/Karpenter.
6.Add Prometheus/Grafana → watch node signals.
7.Protect StatefulSets → PDBs, anti-affinity.
Final YAML Snippets & Commands
List evicted pods
kubectl get pods --all-namespaces --field-selector=status.phase=Failed
Check resource usage
kubectl top nodes
kubectl top pods --all-namespaces
Describe node conditions
kubectl describe node <node-name>
Test autoscaling
kubectl scale deployment <name> --replicas=20
Advanced Signals, Debugging & Future Roadmap
For readers who want to go beyond day-to-day ops and truly master pod eviction mechanics, here are a few advanced layers to understand and watch as clusters grow more demanding in 2025.
TopologyManager & NUMA Awareness
Modern workloads — like HPC, ML training, or GPU-accelerated AI inferencing — push clusters to squeeze the most out of CPUs, memory, and local PCIe or NUMA nodes.
The TopologyManager
aligns pod resource placement for better performance:
It coordinates CPU Manager, Device Manager, and Memory Manager.
For NUMA-aware nodes, it ensures that CPU cores, attached memory, and devices like GPUs or FPGAs are aligned on the same physical socket.
Poor alignment causes cross-socket traffic → higher latency → wasted compute → unpredictable pod resource starvation → possible eviction or performance degradation.
Practical impact:
If you see pods getting evicted or throttled under heavy HPC/AI load, check whether your TopologyManager
and CPUManager
policies are pinned and aligned.
kubeletArguments:
--topology-manager-policy: "best-effort" | "restricted" | "single-numa-node"
For high-performance, use single-numa-node
where possible — but ensure your nodes have the required NUMA layout.
MemoryManager Enhancements (KEP-3575)
With Kubernetes 1.30+, the MemoryManager can reserve huge pages and align memory allocation for specialized workloads.
Why it matters:
In some HPC or ML nodes, improper memory pinning leads to memory fragmentation, page faults, or cross-NUMA access. This can stress the node’s available RAM faster, accidentally triggering MemoryPressure
signals → eviction of unrelated pods.
Reference: KEP-3575: MemoryManager Enhancements
NodeLocal DNSCache
A subtle but real cause of PIDPressure
is excessive DNS queries:
Large microservice architectures often generate thousands of DNS lookups per second.
Each DNS lookup spawns ephemeral processes.
Heavy burst DNS traffic can push a node over its
PID
or CPU threshold.
Solution: Deploy NodeLocal DNSCache:
It runs a local CoreDNS instance on each node.
Reduces the DNS query load on kube-dns.
Speeds up lookups and prevents spikes that could indirectly lead to eviction.
kubectl apply -f https://k8s.io/examples/admin/dns/dns-cache.yaml
For large clusters, this is a best practice.
Ephemeral Containers for Live Debugging
When investigating a node under pressure, ephemeral containers can be your best live debug tool:
Unlike normal containers, ephemeral containers can be injected into a running pod.
Useful for exploring file usage, logs, or networking inside pods that are in CrashLoop or Terminating state.
Great for analyzing why ephemeral storage or PIDs are spiking.
Example:
kubectl debug -it <pod-name> --image=busybox --target=<container-name>
You get a shell inside the namespace of the target container without restarting it.
These advanced signals — TopologyManager
, MemoryManager
, NodeLocal DNSCache
, and ephemeral container debugging — won’t matter for every workload.
But for HPC, AI, GPU workloads, or high-density SaaS clusters, they can make the difference between:
Smooth operations, or
Mystery evictions and wasted cloud spend.
Keep an eye on the latest Kubernetes Enhancement Proposals (KEPs) for resource management , they’re the roadmap to what’s next.
🔗 Resources
Conclusion & Next Steps
Kubernetes evictions are not your enemy — they’re your cluster’s emergency brake, designed to keep nodes alive when resources run out. But when evictions catch you off guard, they can quietly break critical apps, ruin SLAs, and drain cloud budgets with hidden rescheduling and cold-start costs.
The fix is not luck — it’s engineering discipline.
By mastering QoS classes, setting realistic resource requests and limits, and enforcing ephemeral storage controls, you decide which workloads survive when resources run thin.
In 2025 and beyond, tools like Karpenter, Bottlerocket, and smarter scheduling policies make right-sizing nodes and automating placement far more efficient — but they only pay off when paired with good FinOps hygiene. Unexpected evictions are expensive; preventing them is cheap compared to the cost of lost trust and wasted infrastructure.
If you take only one action today:
1. Audit your workloads.
2. Check your BestEffort pods.
3. Set storage limits.
4. Deploy autoscaling.
5. Monitor node signals before the kubelet
must react for you.
When you combine sound resource governance with modern Kubernetes scheduling, you run leaner, break less often, and spend smarter , all while keeping your clusters resilient under real-world production pressure.
Evictions keep your nodes healthy.
Preparation keeps your business healthy.
Mastering Amazon EKS Upgrades: The Ultimate Senior-Level Guide
CrashLoopBackOff with No Logs - Fix Guide for Kubernetes with YAML & CI/CD
Multi-Tenancy in Amazon EKS: Secure, Scalable Kubernetes Isolation with Quotas, Observability & DR
10 Proven kubectl Commands: The Ultimate 2025 AWS Kubernetes Guide
Why Kubernetes Cluster Autoscaler Fails ? Fixes, Logs & YAML Inside
Kubelet Restart in AWS EKS: Causes, Logs, Fixes & Node Stability Guide (2025)
For more topics visit Medium , Dev.to and Dubniumlabs
More Resources
Kubernetes Pod Eviction: Commands & Usage Reference
Command/YAML | What It Does |
---|---|
| List recent events, filtered to show pods evicted by |
| Inspect pod status, look for |
| View node conditions like |
| Show current CPU and memory usage for all nodes. |
| Show resource usage for all pods in all namespaces. |
| Mark a node with a taint to repel unwanted pods unless they tolerate it. |
| Manually scale a Deployment up or down to test autoscaling behaviors. |
| Launch an ephemeral container for live debugging inside a running pod. |
HPA YAML | Define a Horizontal Pod Autoscaler that adjusts replicas based on CPU or memory. |
VPA YAML | Define a Vertical Pod Autoscaler that automatically adjusts pod requests/limits. |
PriorityClass YAML | Create a class that prioritizes critical pods to survive eviction longer. |
Tolerations YAML | Allow pods to tolerate node taints (e.g., critical workload isolation). |
PodDisruptionBudget YAML | Ensure a minimum number of pods stay available during voluntary disruptions. |
Ephemeral Storage Limits YAML | Explicitly limit ephemeral-storage to prevent unbounded disk usage. |
| Deploy NodeLocal DNSCache for local DNS resolution to reduce PID load. |