• Red Signals
  • Posts
  • Mastering Amazon EKS Upgrades: The Ultimate Senior-Level Guide

Mastering Amazon EKS Upgrades: The Ultimate Senior-Level Guide

A step-by-step blueprint to upgrading EKS clusters with zero downtime, full automation, and enterprise-grade best practices.

Upgrading Amazon EKS involves a detailed process of pre-planning, automation, validation, high availability, zero-downtime strategies, rollback handling, and continuous improvement. For DevOps Engineer, this blog provides a structured, enterprise-grade approach divided into three phases: Pre-Upgrade Planning, High Availability & Execution, and Post-Upgrade Validation & Automation.

AWS EKS Upgrade guide

1. What is an EKS Upgrade?

Amazon Elastic Kubernetes Service (EKS) is a managed service that runs Kubernetes clusters on AWS. An EKS upgrade is the process of updating the Kubernetes control plane and worker nodes to a newer Kubernetes version.

πŸ“Œ Includes:

  • Control Plane Upgrade (managed by AWS)

  • Managed Node Groups Upgrade (your responsibility)

  • Add-ons Upgrade: CoreDNS, kube-proxy, CNI, etc.

  • Compatibility Validation: APIs, CRDs, Helm charts

2. Why Are EKS Upgrades Critical?

Reason

Impact

Security Fixes

Patches critical vulnerabilities

Performance Gains

Leverages improvements and optimizations

Deprecated APIs

Prevents application failures

Support Lifecycle

AWS supports only 3 versions at a time

Add-on Compatibility

Ensures VPC CNI, CoreDNS work properly

Failing to upgrade can result in service downtime, security risk, or cluster being unsupported.

3. When to Perform an Upgrade

  • Every 3-6 months or when:

    • AWS announces EKS deprecation

    • Kubernetes releases a new minor version

    • Helm/CRDs show compatibility issues

Example Lifecycle:

  • v1.27 β†’ GA (now)

  • v1.26 β†’ Supported

  • v1.25 β†’ Supported

  • v1.24 β†’ Deprecated β†’ Must upgrade

A simple table showing EKS version β†’ Kubernetes version β†’ supported add-on versions.

EKS Version

K8s Version

CoreDNS

kube-proxy

VPC CNI

1.27

1.27.x

1.10.x

v1.27.x

v1.12.x

1.28

1.28.x

1.11.x

v1.28.x

v1.13.x

Helps engineers quickly identify if their cluster is compatible before upgrading.

4. Upgrade Overview & Architecture

  • πŸ“ Control Plane: AWS-managed (upgrade manually triggered)

  • πŸ“ Node Groups: You manage, must recreate or upgrade

  • πŸ“ Workloads: Run tests in staging before prod

  • πŸ“ Add-ons: Upgrade to latest compatible version

⚠️ Important: Always upgrade Control Plane β†’ Node Groups β†’ Add-ons β†’ Workloads

5. Pre-Upgrade Planning, Backup & Validation

🧠 Why It’s Crucial

  • To ensure data integrity, rollback capability, and compatibility.

βœ… Key Steps:

πŸ”Ή 1. Backup Everything

# Backup etcd-level resources using Velero
velero install --provider aws --bucket eks-backups --backup-location-config region=us-east-1
velero backup create pre-upgrade-backup

# Snapshot EBS volumes (optional for PVs)
aws ec2 create-snapshot --volume-id vol-xyz
  • Encrypt S3 bucket, enable cross-region replication.

πŸ”Ή 2. Review Release Notes

# EKS & Kubernetes version notes
https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html

πŸ”Ή 3. Scan for Deprecated APIs

# kubent: Detect removed/deprecated APIs
kubent

# Pluto: Scan Helm releases
pluto detect-helm -o markdown

πŸ”Ή 4. Test in a Staging Cluster

eksctl create cluster --name staging-cluster --version 1.27 --region us-east-1 \
--nodegroup-name staging-nodes --nodes 2
  • Deploy current workloads, test integrations, validate metrics, simulate DR.

πŸ”Ή 5. Define a Change Plan

  • Stakeholder communication

  • Rollback procedures

  • Maintenance window

AWS EKS Upgrade

6. High Availability & Zero Downtime Upgrade

🧠 Why It’s Crucial

  • To prevent outages, data loss, and SLA violations.

βœ… Step-by-Step

πŸ”Έ 1. Ensure HA Architecture

  • Multi-AZ node groups

  • Multi-AZ ALB/NLB

πŸ”Έ 2. Harden Workloads

# Set PodDisruptionBudgets
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: my-app
  • Add anti-affinity rules, readiness/liveness probes.

πŸ”Έ 3. Upgrade Control Plane

eksctl upgrade cluster --name prod-cluster --version 1.27 --approve

πŸ”Έ 4. Upgrade Node Groups

eksctl upgrade nodegroup --name ng-app --cluster prod-cluster --kubernetes-version 1.27 --approve
  • Use rolling updates, maxUnavailable=1

πŸ”Έ 5. Upgrade Add-ons

# CoreDNS
eksctl update addon --name coredns --cluster prod-cluster --force

# VPC CNI
eksctl update addon --name vpc-cni --cluster prod-cluster --force

# kube-proxy
eksctl update addon --name kube-proxy --cluster prod-cluster --force

πŸ”Έ 6. Monitor Everything

  • Use CloudWatch, Prometheus, Grafana, Loki

  • Set alert rules for app health, latency, error rate

7. Post-Upgrade Validation, Rollback & Automation

Post-Upgrade Validation, Rollback & Automation Health checks, logging, rollback triggers

βœ… 1. Validate Workloads

kubectl get pods -A
kubectl logs -n app-namespace my-app-xxxx
kubectl top pod -n app-namespace
  • Validate horizontal/vertical pod autoscaler behavior

  • Validate ingress, DNS, metrics, dashboards

βœ… 2. Rollback Strategy

# Helm rollback
helm rollback my-release 1

# Revert GitOps manifest
git revert HEAD
kubectl apply -f manifests/

# Recreate nodegroup with older AMI
eksctl create nodegroup --version 1.26 ...

βœ… 3. Post-Upgrade Hygiene

  • Delete old nodegroups

  • Clean up deprecated CRDs

  • Update runbooks, CI/CD, dashboards

  • Document lessons learned

8. Automation Example: GitHub Actions

name: EKS Upgrade Pipeline
on:
  workflow_dispatch:

jobs:
  upgrade:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v2

      - name: Upgrade Control Plane
        run: |
          eksctl upgrade cluster --name prod --version 1.27 --approve

      - name: Upgrade Addons
        run: |
          eksctl update addon --name coredns --cluster prod --force

9. EKS Upgrade Process Flowchart

+------------------------+
|  Pre-Upgrade Planning  |
+------------------------+
           |
           v
+-----------------------------+
|  Backup & Compatibility     |
|  - eksctl snapshot          |
|  - Velero backup            |
|  - kubent/pluto scan        |
+-----------------------------+
           |
           v
+-----------------------------+
|  Stage & Validate in Staging|
|  - Mirror prod workloads    |
|  - Test DR & compatibility  |
+-----------------------------+
           |
           v
+----------------------------+
|  Control Plane Upgrade     |
|  - eksctl upgrade cluster  |
+----------------------------+
           |
           v
+------------------------------+
|  Managed Node Group Upgrade  |
|  - Rolling updates           |
|  - PodDisruptionBudgets      |
+------------------------------+
           |
           v
+-----------------------------+
|  Add-ons Upgrade            |
|  - CoreDNS, kube-proxy      |
|  - VPC CNI, CSI drivers     |
+-----------------------------+
           |
           v
+-----------------------------+
|  Post-Validation & Cleanup  |
|  - Check logs, probes       |
|  - Remove old node groups   |
+-----------------------------+
           |
           v
+-----------------------------+
|  Automation & Observability |
|  - CI/CD, GitOps            |
|  - Monitoring & alerts      |
+-----------------------------+

10. Final Thoughts & Best Practices

Practice

Why It Matters

Always test in staging

Prevents breaking prod

Automate with GitOps

Repeatability and rollback

Monitor aggressively

Detect regressions fast

Document the upgrade

For compliance & future use

Follow AWS upgrade announcements

Stay ahead of deprecations

Upgrading EKS is more than just a version bump. It is a controlled, observable, secure, and team-aligned DevOps lifecycle that should be automated, reproducible, and rollback-safe. Follow these practices, and you'll not only ace upgrades but also scale your Kubernetes architecture with confidence.

πŸ“š References