• Red Signals
  • Posts
  • How to Prevent Critical Pod Eviction in Kubernetes: Resource Pressure, QoS, Ephemeral Storage 2025

How to Prevent Critical Pod Eviction in Kubernetes: Resource Pressure, QoS, Ephemeral Storage 2025

Stop unexpected pod evictions in Kubernetes. Learn how to handle resource pressure, set proper QoS classes, manage ephemeral storage limits, and use tools like Karpenter to keep your production clusters stable and cost-efficient in 2025.

When your Kubernetes cluster runs out of resources memory, CPU, or disk , the kubelet kicks in its survival instincts: it evicts pods to protect the node’s health.

In cloud-native production, surprise evictions mean downtime, broken pipelines, and hidden cost explosions.

This guide breaks down:

  • Exactly how evictions work

  • The role of QoS classes

  • What changed in Kubernetes by 2025

  • How to prevent costly resource pressure

  • Production-ready YAML and real kubectl commands

What Is Critical Pod Eviction?

Eviction is the kubelet’s last-resort measure to free up resources. If your node’s RAM or disk crosses thresholds, the kubelet forcibly terminates lower-priority pods until resource health is restored.

 Eviction is about the node staying healthy not about whether your app wants to stay alive.

Critical Pod Eviction: Quick Reference Table

Aspect

Explanation

What

Eviction means the kubelet forcefully removes pods from a node to free up critical resources (memory, disk, PIDs). It’s not a crash — it’s a protective action.

How

The kubelet monitors node conditions. When resource usage crosses defined thresholds (e.g., low available memory, disk full, too many PIDs), it picks lower-priority pods for eviction based on their QoS class.

Why

To keep the node itself healthy. If the node runs out of memory, disk, or PIDs, all workloads would fail. Eviction sacrifices the least critical pods to protect the whole system.

When

Happens in real time as soon as a node hits resource pressure thresholds — often triggered by unexpected usage spikes, log file growth, runaway processes, or unbounded ephemeral storage.

Where

On any Kubernetes worker node under resource pressure. Most common in high-density clusters, stateful workloads with heavy disk IO, or legacy apps with poor resource requests.

What To Do

1. Set realistic requests & limits to get higher QoS.

2. Control ephemeral storage with explicit limits.

3. Use HPA/VPA and Karpenter for right-sizing.

4. Monitor MemoryPressure, DiskPressure, PIDPressure.

5. Protect critical workloads with PriorityClasses and PDBs.

6. Test failover & node drain scenarios regularly.

See Eviction Events:

kubectl get events --sort-by='.lastTimestamp' --field-selector reason=Evicted
Kubernetes Critical Pod Eviction Workflow

 OOMKill vs Eviction: Know the Difference

Many confuse OOMKills and Evictions, but they’re different:

OOMKill

Eviction

Who triggers?

Linux kernel (OOM Killer)

kubelet

Why?

Container uses more memory than its limit

Node as a whole hits pressure thresholds

Result

Container is killed instantly

Pod is gracefully terminated (or forcefully if needed)

Fix

Adjust limits

Adjust node sizing, requests, quotas

How to spot an OOMKill

kubectl describe pod <pod-name>
# Look for 'OOMKilled' in the container status.

How to spot an Eviction

kubectl describe pod <pod-name>
# Status: Failed, Reason: Evicted
# Message: The node had condition: [MemoryPressure].

Understanding Kubernetes QoS Classes

When the node needs to decide which pods live or die, it looks at their QoS class.

QoS Class

When Used

Eviction Priority

Guaranteed

All containers have equal requests = limits for CPU & memory

Evicted last

Burstable

At least 1 container has requests < limits

Medium

BestEffort

No requests or limits

Evicted first

Example: Guaranteed QoS

resources:
  requests:
    cpu: "500m"
    memory: "256Mi"
  limits:
    cpu: "500m"
    memory: "256Mi"

What Triggers Eviction? Node Signals & Conditions

The kubelet watches signals — basically usage vs thresholds.

Signal

Example Default

When It Matters

MemoryPressure

memory.available < 100Mi

Node RAM nearly exhausted

DiskPressure

nodefs.available < 10%

Disk usage too high

PIDPressure

Too many processes

Process table limit hit

Check node status:

kubectl describe node <node-name>

Look for:

Conditions:
  Type             Status
  ----             ------
  MemoryPressure   True
  DiskPressure     True
  PIDPressure      False

Ephemeral Storage Evictions: The Silent Killer

Ephemeral storage (a.k.a. container scratch space) is one of the top reasons for unexpected pod evictions in real clusters.

How to check ephemeral usage

kubectl describe pod <pod-name> | grep -i ephemeral

Always set ephemeral storage limits:

resources:
  requests:
    ephemeral-storage: "1Gi"
  limits:
    ephemeral-storage: "2Gi"

Monitoring & Early Detection

Smart teams detect node pressure before kubelet does.

Prometheus Rules Example

- alert: NodeMemoryPressure
  expr: kube_node_status_condition{condition="MemoryPressure",status="true"} == 1
  for: 5m
  labels:
    severity: critical

- alert: NodeDiskPressure
  expr: kube_node_status_condition{condition="DiskPressure",status="true"} == 1
  for: 5m

Useful Grafana panels

  • node allocatable vs used

  • pod requests vs usage

  • evicted pods over time

Prevention & Hardening Strategies (2025)

 Golden rules:

  1. Always set realistic requests and limits → no BestEffort!

  2. Use PriorityClasses for truly critical pods.

  3. Isolate workloads with taints and tolerations.

  4. Right-size nodes — on AWS EKS, Bottlerocket is a great minimal OS choice.

  5. Use PodDisruptionBudgets (PDBs) to control voluntary evictions.

Priority Class Example

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
globalDefault: false
description: "Run this only for critical workloads."

Taint & Toleration Example

kubectl taint nodes node1 dedicated=critical:NoSchedule
tolerations:
- key: "dedicated"
  operator: "Equal"
  value: "critical"
  effect: "NoSchedule"

Modern Autoscaling & Karpenter (2025)

Horizontal Pod Autoscaler (HPA)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Vertical Pod Autoscaler (VPA)

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind:       Deployment
    name:       api
  updatePolicy:
    updateMode: "Auto"

Karpenter

By 2025, Karpenter is the go-to next-gen node autoscaler for AWS EKS. It’s:

  • Faster than Cluster Autoscaler.

  • Ephemeral storage-aware.

  • GPU-aware.

  • Optimized for FinOps.

Stateful Workloads: Extra Protection

 StatefulSets like DBs can’t just be restarted on a new node. If they get evicted due to disk pressure, you risk data loss.

Best practices:

  • Use PodDisruptionBudgets to ensure quorum.

  • Use anti-affinity rules to spread pods across nodes.

spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - my-db
        topologyKey: "kubernetes.io/hostname"

FinOps Impact: Evictions & Hidden Costs

Evictions mean:

  • Unscheduled pods reschedule → extra API calls → overhead.

  • Nodes overloaded → poor bin-packing → wasted capacity.

  • Repeated cold starts → cost spikes for ephemeral storage, IOPS, or GPUs.

 FinOps lesson: Proper requests, limits, and autoscaling keep workloads efficient and save dollars.

Production War Story

Incident: A media startup saw their EKS pods evicted every Friday evening. Root cause? Build agents dumping 100s of GBs of logs to ephemeral storage → DiskPressure → critical backend pods evicted instead.

Fix:

  • Added ephemeral storage limits.

  • Moved logs to an S3 sidecar.

  • Enabled Karpenter for smarter scale-up.
    Result: Evictions dropped by 90%, infra costs fell 20%.

Troubleshooting Flowchart

Pod evicted? Start here:

1. kubectl describe pod → check Reason.
2.kubectl describe node → check Conditions.
3.Check requests/limits → fix BestEffort pods.
4.Check ephemeral storage → add limits.
5.Tune autoscaling → HPA/VPA/Karpenter.
6.Add Prometheus/Grafana → watch node signals.
7.Protect StatefulSets → PDBs, anti-affinity.

Final YAML Snippets & Commands

List evicted pods

kubectl get pods --all-namespaces --field-selector=status.phase=Failed

Check resource usage

kubectl top nodes
kubectl top pods --all-namespaces

Describe node conditions

kubectl describe node <node-name>

 Test autoscaling

kubectl scale deployment <name> --replicas=20

Advanced Signals, Debugging & Future Roadmap

For readers who want to go beyond day-to-day ops and truly master pod eviction mechanics, here are a few advanced layers to understand and watch as clusters grow more demanding in 2025.

TopologyManager & NUMA Awareness

Modern workloads — like HPC, ML training, or GPU-accelerated AI inferencing — push clusters to squeeze the most out of CPUs, memory, and local PCIe or NUMA nodes.

The TopologyManager aligns pod resource placement for better performance:

  • It coordinates CPU Manager, Device Manager, and Memory Manager.

  • For NUMA-aware nodes, it ensures that CPU cores, attached memory, and devices like GPUs or FPGAs are aligned on the same physical socket.

  • Poor alignment causes cross-socket traffic → higher latency → wasted compute → unpredictable pod resource starvation → possible eviction or performance degradation.

Practical impact:
If you see pods getting evicted or throttled under heavy HPC/AI load, check whether your TopologyManager and CPUManager policies are pinned and aligned.

kubeletArguments:
  --topology-manager-policy: "best-effort" | "restricted" | "single-numa-node"

For high-performance, use single-numa-node where possible — but ensure your nodes have the required NUMA layout.

MemoryManager Enhancements (KEP-3575)

With Kubernetes 1.30+, the MemoryManager can reserve huge pages and align memory allocation for specialized workloads.

Why it matters:
In some HPC or ML nodes, improper memory pinning leads to memory fragmentation, page faults, or cross-NUMA access. This can stress the node’s available RAM faster, accidentally triggering MemoryPressure signals → eviction of unrelated pods.

NodeLocal DNSCache

A subtle but real cause of PIDPressure is excessive DNS queries:

  • Large microservice architectures often generate thousands of DNS lookups per second.

  • Each DNS lookup spawns ephemeral processes.

  • Heavy burst DNS traffic can push a node over its PID or CPU threshold.

Solution: Deploy NodeLocal DNSCache:

  • It runs a local CoreDNS instance on each node.

  • Reduces the DNS query load on kube-dns.

  • Speeds up lookups and prevents spikes that could indirectly lead to eviction.

kubectl apply -f https://k8s.io/examples/admin/dns/dns-cache.yaml

 For large clusters, this is a best practice.

Ephemeral Containers for Live Debugging

When investigating a node under pressure, ephemeral containers can be your best live debug tool:

  • Unlike normal containers, ephemeral containers can be injected into a running pod.

  • Useful for exploring file usage, logs, or networking inside pods that are in CrashLoop or Terminating state.

  • Great for analyzing why ephemeral storage or PIDs are spiking.

Example:

kubectl debug -it <pod-name> --image=busybox --target=<container-name>

You get a shell inside the namespace of the target container without restarting it.

These advanced signalsTopologyManager, MemoryManager, NodeLocal DNSCache, and ephemeral container debugging — won’t matter for every workload.

But for HPC, AI, GPU workloads, or high-density SaaS clusters, they can make the difference between:

  • Smooth operations, or

  • Mystery evictions and wasted cloud spend.

Keep an eye on the latest Kubernetes Enhancement Proposals (KEPs) for resource management , they’re the roadmap to what’s next.

🔗 Resources

Conclusion & Next Steps

Kubernetes evictions are not your enemy — they’re your cluster’s emergency brake, designed to keep nodes alive when resources run out. But when evictions catch you off guard, they can quietly break critical apps, ruin SLAs, and drain cloud budgets with hidden rescheduling and cold-start costs.

The fix is not luck — it’s engineering discipline.
By mastering QoS classes, setting realistic resource requests and limits, and enforcing ephemeral storage controls, you decide which workloads survive when resources run thin.

In 2025 and beyond, tools like Karpenter, Bottlerocket, and smarter scheduling policies make right-sizing nodes and automating placement far more efficient — but they only pay off when paired with good FinOps hygiene. Unexpected evictions are expensive; preventing them is cheap compared to the cost of lost trust and wasted infrastructure.

If you take only one action today:
1. Audit your workloads.
2. Check your BestEffort pods.
3. Set storage limits.
4. Deploy autoscaling.
5. Monitor node signals before the kubelet must react for you.

When you combine sound resource governance with modern Kubernetes scheduling, you run leaner, break less often, and spend smarter , all while keeping your clusters resilient under real-world production pressure.

Evictions keep your nodes healthy.
Preparation keeps your business healthy.

More Resources

Kubernetes Pod Eviction: Commands & Usage Reference

Command/YAML

What It Does

kubectl get events --sort-by='.lastTimestamp' --field-selector reason=Evicted

List recent events, filtered to show pods evicted by kubelet.

kubectl describe pod <pod-name>

Inspect pod status, look for Evicted reason or OOMKilled.

kubectl describe node <node-name>

View node conditions like MemoryPressure, DiskPressure, PIDPressure.

kubectl top nodes

Show current CPU and memory usage for all nodes.

kubectl top pods --all-namespaces

Show resource usage for all pods in all namespaces.

kubectl taint nodes <node-name> key=value:NoSchedule

Mark a node with a taint to repel unwanted pods unless they tolerate it.

kubectl scale deployment <name> --replicas=<number>

Manually scale a Deployment up or down to test autoscaling behaviors.

kubectl debug -it <pod-name> --image=busybox --target=<container-name>

Launch an ephemeral container for live debugging inside a running pod.

HPA YAML

Define a Horizontal Pod Autoscaler that adjusts replicas based on CPU or memory.

VPA YAML

Define a Vertical Pod Autoscaler that automatically adjusts pod requests/limits.

PriorityClass YAML

Create a class that prioritizes critical pods to survive eviction longer.

Tolerations YAML

Allow pods to tolerate node taints (e.g., critical workload isolation).

PodDisruptionBudget YAML

Ensure a minimum number of pods stay available during voluntary disruptions.

Ephemeral Storage Limits YAML

Explicitly limit ephemeral-storage to prevent unbounded disk usage.

kubectl apply -f https://k8s.io/examples/admin/dns/dns-cache.yaml

Deploy NodeLocal DNSCache for local DNS resolution to reduce PID load.