How I Survived the Great Kubernetes Exodus: Migrating EKS Cluster from v1.26 to v1.33 on AWS

This content originally appeared on DEV Community and was authored by Ukeme David Eseme

A comprehensive tale of migrating a production AWS Kubernetes cluster with 6000+ resources, 46 CRDs, 7 SSL certificates, 12 Namespaces and zero downtime

Introduction: The Challenge Ahead

Upgrading a production-grade Kubernetes cluster is never a walk in the park—especially when it spans multiple environments, critical workloads, and tight deadlines.

So when it was time to migrate a clients 3-4 years old Amazon EKS cluster from v1.26 to v1.33, I knew it wouldn’t just be a version bump—it would be a battlefield.

This cluster wasn't just any cluster—it was a complex ecosystem running critical healthcare applications with:

46 Custom Resource Definitions (CRDs) across multiple systems
7 production domains with SSL certificates
Critical data in PostgreSQL databases
Zero downtime tolerance for production services
Complex networking with Istio service mesh
Monitoring stack with Prometheus and Grafana

This is the story of how we successfully migrated this beast using a hybrid approach, the challenges we faced, and the lessons we learned along the way.

Chapter 1: The Reconnaissance Phase

Mapping the Battlefield

Before diving into the migration, we needed to understand exactly what we were dealing with. There was no gitops, no manifest files, what we got was AWS access, Lens and an outdated cluster that needs to be upgraded.

Kubernetes enforces a strict version skew policy, especially when you’re using managed services like Elastic Kubernetes Service (EKS).

The control plane must always be:

one minor version ahead of the kubelets (worker nodes).
All supporting tools—kubeadm, kubelet, kubectl, and add-ons—must also respect this version skew policy.

So what does this mean?

If your control plane is running v1.33, your worker nodes can only be on v1.32 or v1.33. Nothing lower.
And no, you can’t jump straight from v1.26 to v1.33. You must upgrade sequentially: v1.26 → v1.27 → v1.28 → ... → v1.33

Each upgrade step? A potential minefield of broken dependencies, deprecated APIs, and mysterious behavior.

💀 The Aging Cluster

The cluster I inherited was running Kubernetes v1.26—with some workloads and CRDs that hadn’t been touched in about 4 years.
It was ancient. It was fragile. And it was about to get a rude awakening.

🧪 First Attempt: The “By-the-Book” Upgrade

I tried to play nice.
The goal: Upgrade the cluster manually, step-by-step from v1.26 **all the way to **v1.33.

But the moment I moved from v1.26 → v1.27, the floodgates opened:

Pods crashing from all directions,
Incompatible controllers acting out,
Deprecation warnings lighting up the logs like Christmas trees.

Let’s just say—manual upgrades were off the table.

🛠️ Second Attempt: The Manifest Extraction Strategy

Time to pivot.

The new plan?

Spin up a fresh EKS cluster running v1.33, then lift-and-shift resources from the old cluster.

Step 1: Extract All Resources
From the old cluster I ran:

kubectl get all --all-namespaces -o yaml > all-resources.yaml

Then I backed up other critical components:

ConfigMaps
Secrets
PVCs
Ingresses
CRDs
RBAC
ServiceAccounts

kubectl get configmaps,secrets,persistentvolumeclaims,ingresses,customresourcedefinitions,roles,rolebindings,clusterroles,clusterrolebindings,serviceaccounts --all-namespaces -o yaml > extras.yaml

Step 2: Apply to the New Cluster
Switched context:

kubectl config use-context <cluster-arn>

And then:

kubectl apply -f all-resources.yaml extras.yaml

Boom—in one swoop, everything started deploying into the new cluster.

For a moment, I thought:

“Wow… that was easy. Too easy.”

🚨 Reality Check: The Spaghetti Hit the Fan

After 8 hours of hopeful waiting, the nightmare unfolded:

CrashLoopBackOff
ImagePullBackOff
Pending Pods
Service Not Reachable
VolumeMount and PVC errors everywhere

It was YAML spaghetti, tangled and broken.

The old cluster’s legacy configurations simply did not translate cleanly to the modern version.
And now, I had to dig in deep—resource by resource, namespace by namespace, to rebuild sanity, which i didn't have the time and luxury for.

⚙️ Third Attempt: Enter Velero

The next strategy? Use Velero.
Install it in the old cluster, run a full backup, switch contexts, and restore everything into the shiny new v1.33 cluster.

Simple, right?

Not quite.

Velero pods immediately got stuck in Pending.
Why?

Insufficient resources in the old cluster
CNI-related issues that blocked network provisioning

So instead of backup and restore magic, I found myself deep in another rabbit hole.

🧠 Fourth Attempt: Organized Manifest Extraction — The Breakthrough

Out of frustration, I raised the issue during a session in the AWS DevOps Study Group.

That’s when Theo and Jaypee stepped in with game-changing advice:

“Forget giant YAML dumps. Instead, extract manifests systematically, grouped by namespace and resource type. Organize them in folders. Leverage Amazon Q in VS Code to make sense of the structure.”

It was a lightbulb moment💡.

I restructured the entire migration approach based on their idea—breaking down the cluster into modular, categorized directories.
It brought clarity, control, and confidence back to the process.

📦 The CRD Explosion

Once things were neatly organized, the real scale of the system came into focus.

Major CRDs We Had to Handle:

Istio Service Mesh: 12 CRDs managing traffic routing and security
Prometheus/Monitoring: 8 CRDs for metrics and alerting
Cert-Manager: 7 CRDs handling SSL certificate automation
Velero Backup: 8 CRDs for disaster recovery
AWS Controllers: 11 CRDs for cloud integration

🧮 Total: 46 CRDs — each one a potential migration minefield

🔍 Custom Resources Inventory

Beyond the CRDs themselves, the custom resources were no less intimidating:

11+ TLS Certificates across multiple namespaces
6+ ServiceMonitors for Prometheus scraping
Multiple PrometheusRules for alerting
VirtualServices and DestinationRules for Istio routing

The message was clear:

This wasn’t a “one-file kubectl apply” kind of migration.

✅ API Compatibility Victory

With the structure in place, we ran API compatibility checks using Pluto and a custom script generated via Amazon Q in VS Code:

./scripts/api-compatibility-check.sh

Result:

✅ No deprecated or incompatible API versions found.

A small win—but a huge morale boost in a complex migration journey.

📦 Chapter 2: The Data Dilemma

💡 Choosing Our Weapon: Manual EBS Snapshots

When it came to migrating persistent data, we faced a critical decision. Several options were on the table:

Velero backups – our usual go-to, but ruled out due to earlier issues with pod scheduling and CNI errors.
Database dumps – possible, but slow, error-prone, and fragile under pressure.
Manual EBS snapshots – low-level, reliable, and simple

After weighing the risks, we went old-school with manual EBS snapshots.
They offered direct access to data volumes with minimal tooling—and in a high-stakes migration, simplicity is a virtue.

Sometimes, the old ways are still the best ways.

🛠️ Automation to the Rescue

To streamline the snapshot process, I wrote a simple backup script:

./scripts/manual-ebs-backup.sh

It handled the tagging and snapshot creation for each critical volume, ensuring traceability and rollback capability.

🔐 Critical Volumes Backed Up

Here are some of the most important data volumes we preserved:

tools/pgadmin-pgadmin4 → snap-06257a13c49e125b1
sonarqube/data-sonarqube-postgresql-0 → snap-0e590f608a631fcc3 Each snapshot became a lifeline, preserving vital stateful components of our workloads as we prepped the new cluster.

🏗️ Chapter 3: Building the New Kingdom

Once the old cluster was archived and dissected, it was time to construct the new realm—clean, modern, and battle-hardened.

⚙️ The Foundation: CRD Installation Order Matters

One of the most overlooked but mission-critical lessons we learned during this journey:

The order in which you install your CRDs can make or break your cluster.
Install them in the wrong sequence, and you’ll find yourself swimming in cryptic errors, broken controllers, and cascading failures that seem to come from nowhere.

After a lot of trial, and error (especially with istio, gave me a lot of trouble), I landed on a battle-tested CRD deployment sequence:

# 1. Cert-Manager (many other components rely on it for TLS provisioning)
helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --set installCRDs=true

# 2. Monitoring Stack (metrics, alerting, dashboards)
helm install prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace

# 3. AWS Integration (Load balancer controller, IAM roles, etc.)
helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
  -n kube-system

# 4. Service Mesh (Istio control plane)
istioctl install --set values.defaultRevision=default

🧘 **Pro Tip: **After each installation, wait until the operator and all dependent pods are fully healthy before continuing.
Kubernetes is fast… but rushing this step will cost you hours down the line.

🧬 Data Resurrection: Bringing Back Our State

With the new infrastructure laid out, it was time to resurrect the lifeblood of the platform—its data.

Using our EBS snapshots from earlier, we restored the volumes and re-attached them to their rightful claimants:

bash scripts/restore-ebs-volumes.sh

Restored Volumes:

tools/pgadmin → vol-0166bbae7bd2eb793
sonarqube/postgresql → vol-0262e16e1bd5df028

Held my breath… and then—

✅ PersistentVolumes bound successfully
✅ StatefulSets recovered
✅ Pods restarted with their original data

It was official: our new kingdom had data, structure, and a beating heart.

🎭 Chapter 4: The Application Deployment Dance

The Dependency Choreography

Deploying applications in Kubernetes isn’t just about applying YAML files—it’s a delicate choreography of interdependent resources, where the order of execution can make or break your deployment.

Get the sequence wrong, and you’re looking at a cascade of errors:
missing secrets, broken RBAC, unbound PVCs, and pods stuck in limbo.
We approached it like conducting an orchestra—each instrument with its cue.

🪜 Step-by-Step Deployment Strategy

1. Foundation First: ServiceAccounts, ConfigMaps, and Secrets
These are the building blocks of your cluster environment.
No app should be launched before its supporting config and identity infrastructure are in place.

kubectl apply -f manifests/*/serviceaccounts/
kubectl apply -f manifests/*/configmaps/
kubectl apply -f manifests/*/secrets/

2. RBAC Granting the Right Access
Once identities are in place, we assign the right permissions using Roles and RoleBindings—especially for monitoring and system tools.

kubectl apply -f manifests/monitoring/roles/
kubectl apply -f manifests/monitoring/rolebindings/

⚠️ Lesson: Don’t skip this step or your logging agents and monitoring stack will sit silently—failing without errors.

3. Persistent Storage: Claim Before You Launch
Storage is like the stage on which your stateful applications perform.
We provisioned all PersistentVolumeClaims (PVCs) before deploying workloads to avoid CrashLoopBackOff errors related to missing mounts.

kubectl apply -f manifests/tools/persistentvolumeclaims/
kubectl apply -f manifests/sonarqube/persistentvolumeclaims/

4. Workloads: Let the Apps Take the Stage
With the foundation solid and access configured, it was time to deploy the actual workloads—both stateless and stateful.

kubectl apply -f manifests/tools/deployments/
kubectl apply -f manifests/sonarqube/statefulsets/
# ... and the rest

Status: Applications Deployed and Running ✅

At first glance, everything seemed perfect—pods were green, services were responsive, and dashboards were lighting up.

I exhaled.
But the celebration didn’t last long.

Behind those green pods were networking glitches, DNS surprises, and service discovery issues lurking in the shadows—ready to pounce.

🔐 Chapter 5: The Great SSL Certificate Saga

Just when I thought the migration was complete and everything was running smoothly, the ghost of SSL past returned to haunt.

The Mystery of the Expired Certificates

Just when we thought we were done, we discovered a critical issue:

NAMESPACE     NAME                 CLASS    HOSTS                                         PORTS   AGE
qaclinicaly    bida-fe-clinicaly    <none>   bida-fe-qaclinicaly.example.net         80,443   59s

At first glance, it looked fine. But a quick curl and browser visit revealed a nasty surprise:

“Your connection is not private”
“This site’s security certificate expired 95 days ago”

Another issue, that should cause panic and confusion, but I was calm. We can fix this!
Upon further inspection, every certificate in the cluster was showing:

READY: False

Cert-manager was deployed. The pods were healthy. But nothing was being issued.

🔎 The Missing Link: ClusterIssuer

Digging deeper into the logs, I found the root cause:

The ClusterIssuer for Let’s Encrypt was missing entirely.

Without it, Cert-Manager had no idea how to obtain or renew certificates.
Somehow, it had slipped through the cracks during our migration process.

🛠️ The Quick Fix

Recreated the missing ClusterIssuer using the standard ACME configuration:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: good-devops@example.com
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
    - http01:
        ingress:
          class: nginx

Applied it to the cluster

kubectl apply -f cluster-issuer.yaml

Despite the ClusterIssuer being present and healthy, the certificates still wouldn’t renew. The plot thickened...

⚠️ Chapter 6: The AWS Load Balancer Controller Nightmare

Just when I thought the worst was behind me, the AWS Load Balancer Controller decided to stir up fresh chaos.

🧩 The IAM Permission Maze

The first clue came from the controller logs—littered with authorization errors like this:

"error":"operation error EC2: DescribeAvailabilityZones, https response error StatusCode: 403, RequestID: 3ba25abe-7bb2-4b05-bb33-26fde9696931, api error UnauthorizedOperation: You are not authorized to perform this operation"

That 403 told me everything I needed to know:

The controller lacked the necessary IAM permissions to interact with AWS APIs.

What followed was a deep dive into the AWS IAM Policy abyss—where small misconfigurations can lead to hours of head-scratching and trial-and-error debugging.

🔐 The Fix: A Proper IAM Role and Trust Policy

To get the controller working, I created a dedicated IAM role with the required permissions using Amazon Q, and then annotated the Kubernetes service account to assume it.

# Create the IAM role
aws iam create-role \
  --role-name AmazonEKS_AWS_Load_Balancer_Controller \
  --assume-role-policy-document file://aws-lb-controller-trust-policy.json

# Attach the managed policy
aws iam attach-role-policy \
  --role-name AmazonEKS_AWS_Load_Balancer_Controller \
  --policy-arn arn:aws:iam::830714671200:policy/AWSLoadBalancerControllerIAMPolicy

# Annotate the controller's service account in Kubernetes
kubectl annotate serviceaccount aws-load-balancer-controller \
  -n kube-system \
  eks.amazonaws.com/role-arn=arn:aws:iam::830714671200:role/AmazonEKS_AWS_Load_Balancer_Controller \
  --overwrite

With the IAM role in place and attached, I expected smooth sailing—but Kubernetes had other plans.

🌐 The Internal vs Internet-Facing Revelation

Even with the right permissions, certificates still weren’t issuing.
Let’s Encrypt couldn’t validate the ACME HTTP-01 challenge—and I soon discovered why.

Running this command:

aws elbv2 describe-load-balancers \
  --names k8s-ingressn-ingressn-9a8b080581 \
  --region eu-central-1 \
  --query 'LoadBalancers[0].Scheme'

Returned:

json
"internal"

The NGINX ingress LoadBalancer was internal, which made it unreachable from the internet—completely blocking Let’s Encrypt from reaching the verification endpoint.

🛠️ The Fix: Force Internet-Facing Scheme

I updated the annotation on the NGINX controller service:

kubectl annotate svc ingress-nginx-controller \
  -n ingress-nginx \
  service.beta.kubernetes.io/aws-load-balancer-scheme=internet-facing \
  --overwrite

This change recreated the LoadBalancer, this time with internet-facing access.

🌐 Chapter 7: The DNS Migration Challenge

The Automated Solution

Once the internet-facing LoadBalancer was live and SSL certs were flowing, there was still one critical piece left: DNS.

The new LoadBalancer came with a new DNS name, and I had seven production domains that needed to point to it.

Doing this manually in the Route 53 console?
Slow. Risky. Error-prone.

⚙️ The Automated Solution

To avoid mistakes and speed things up, I wrote a script to automate the DNS updates using the AWS CLI.

#!/bin/bash
HOSTED_ZONE_ID="Z037069025V45CB576XJD"
NEW_LB="k8s-ingressn-ingressn-testing12345-9287c75b76ge25zc.elb.eu-central-1.amazonaws.com"
NEW_LB_ZONE_ID="Z3F0SRJ5LGBH90"

update_dns_record() {
    local domain=$1
    aws route53 change-resource-record-sets \
        --hosted-zone-id "$HOSTED_ZONE_ID" \
        --change-batch "{
            \"Changes\": [{
                \"Action\": \"UPSERT\",
                \"ResourceRecordSet\": {
                    \"Name\": \"$domain\",
                    \"Type\": \"A\",
                    \"AliasTarget\": {
                        \"DNSName\": \"$NEW_LB\",
                        \"EvaluateTargetHealth\": true,
                        \"HostedZoneId\": \"$NEW_LB_ZONE_ID\"
                    }
                }
            }]
        }"
}

By calling update_dns_record with each domain, I was able to quickly and safely redirect traffic to the new cluster.

✅ Domains migrated:
Here are the domains I successfully updated:

kafka-dev.example.net
pgadmin-dev.example.net
sonarqube.example.net
bida-fe-qaclinicaly.example.net
bida-gateway-qaclinicaly.example.net
bida-fe-qaprod.example.net
eduaid-admin-qaprod.example.net

Each one now points to the new LoadBalancer, resolving to the right service in the new EKS cluster.

🏁 Chapter 8: The Final Victory

⚔️ The Moment of Truth

After battling through IAM issues, LoadBalancer headaches, DNS rewiring, and countless YAML files, it all came down to one final moment: Would the certificates issue successfully?

I decided to start fresh and purge any leftover Cert-Manager resources to ensure there were no stale or broken states hanging around:

# Clean slate approach
kubectl delete challenges --all --all-namespaces
kubectl delete orders --all --all-namespaces
kubectl delete certificates --all --all-namespaces

Then I waited.....
Refreshed.....
Checked logs.....
Waited some more....

✅ And Then—Success

NAMESPACE    NAME                                            READY   SECRET                                          AGE
qaclinicaly   bida-fe-qaclinicaly.example.net-crt        True    bida-fe-qaclinicaly.example.net-crt        3m
qaclinicaly   bida-gateway-qaclinicaly.example.net-crt   True    bida-gateway-qaclinicaly.example.net-crt   3m
qaprod       bida-fe-qaprod.example.net-crt            True    aida-fe-qaprod.example.net-crt            2m59s
qaprod       eduaid-admin-qaprod.example.net-crt          True    eduaid-admin-qaprod.example.net-crt          2m59s
sonarqube    sonarqube.example.net-crt                 True    sonarqube.example.net-crt                 2m59s
tools        kafka-dev.example.net-tls                 True    kafka-dev.example.net-tls                 2m59s
tools        pgadmin-dev.example.net-tls               True    pgadmin-dev.example.net-tls               2m59s

ALL 7 CERTIFICATES flipped to : READY = TRUE 🎉

📘 Chapter 9: Lessons Learned

🔧 Technical Insights

CRD Installation Order is Critical: Install core dependencies first. Cert-manager before anything else.
IAM Permissions are Tricky: Minimal IAM policies might pass linting, but they’ll fail at runtime. Use comprehensive, purpose-built roles.
LoadBalancer Schemes Matter: The difference between internal and internet-facing can break certificate validation entirely.
DNS Automation Saves Time and Sanity: Manual Route 53 updates are error-prone. Automate with scripts and avoid the guesswork.
EBS Snapshots are Underrated:
Sometimes the simplest tools are the most reliable. EBS snapshots gave me peace of mind and fast recovery.

🧠 Operational Insights
Plan for the Unexpected:

SSL certificate issues took more time than the core migration itself.
Automate Early, Automate Often:

The scripts I wrote saved hours and helped enforce repeatable processes.
Document Everything:

Every command, every fix, every gotcha—write it down. It pays off when something goes wrong (and it will).
Be Patient:

DNS propagation and cert validation can be slow. Don’t panic—just wait.
Always Have a Rollback Plan:

Keeping the old cluster alive gave me confidence to move fast with less fear of failure.

🛠️ Custom Tools That Saved Us

scripts/update-dns-records.sh - Automated DNS cutover
scripts/manual-ebs-backup.sh - Fast and reliable data backup
letsencrypt-clusterissuer.yaml - Enabled SSL cert automation
Comprehensive IAM policies - Smooth AWS integration with the load balancer controller

📊 Chapter 10: The Final Status

✅ Migration Scorecard

Area	Status
Infrastructure	46 CRDs and all operators deployed ✅
Data Migration	EBS volumes restored successfully ✅
DNS Migration	All 7 domains updated ✅
SSL Certificates	All validated and active ✅
LoadBalancer	Internet-facing and functional ✅
Applications	Fully deployed and operational ✅

Performance Metrics

Total Migration Time: ~18 hours (including troubleshooting)
Downtime: 0 minutes (DNS cutover was seamless)
Data Loss: 0 bytes
Certificate Validation Time: 3 minutes (after fixes)
DNS Propagation Time: 2-5 minutes

Conclusion: The Journey's End

What started as a routine Kubernetes version upgrade turned into an epic journey through the depths of AWS IAM policies, LoadBalancer configurations, and SSL certificate validation. We faced challenges we never expected and learned lessons that will serve us well in future migrations.

The key takeaway? Kubernetes migrations are never just about Kubernetes. They're about the entire ecosystem—DNS, SSL certificates, cloud provider integrations, and all the moving parts that make modern applications work.

Our hybrid approach using manual EBS snapshots proved to be the right choice for our use case. While it required more manual work upfront, it gave us confidence in our data integrity and a clear rollback path.

What's Next?

With our new v1.33 cluster running smoothly, we're already planning for the future:

Implementing GitOps for better deployment automation
Enhancing our monitoring and alerting
Preparing for the next major version upgrade (with better automation!)

Final Words

To anyone embarking on a similar journey: expect the unexpected, automate everything you can, and always have a rollback plan. The path may be challenging, but the destination—a modern, secure, and scalable Kubernetes cluster—is worth every debugging session.

Migration Status: ✅ COMPLETE

The cluster is dead, long live the cluster!

This content originally appeared on DEV Community and was authored by Ukeme David Eseme

Print Share Comment Cite Upload Translate Updates

APA

Ukeme David Eseme | Sciencx (2025-07-05T00:25:04+00:00) How I Survived the Great Kubernetes Exodus: Migrating EKS Cluster from v1.26 to v1.33 on AWS. Retrieved from https://www.scien.cx/2025/07/05/how-i-survived-the-great-kubernetes-exodus-migrating-eks-cluster-from-v1-26-to-v1-33-on-aws/

MLA

" » How I Survived the Great Kubernetes Exodus: Migrating EKS Cluster from v1.26 to v1.33 on AWS." Ukeme David Eseme | Sciencx - Saturday July 5, 2025, https://www.scien.cx/2025/07/05/how-i-survived-the-great-kubernetes-exodus-migrating-eks-cluster-from-v1-26-to-v1-33-on-aws/

HARVARD

Ukeme David Eseme | Sciencx Saturday July 5, 2025 » How I Survived the Great Kubernetes Exodus: Migrating EKS Cluster from v1.26 to v1.33 on AWS., viewed ,<https://www.scien.cx/2025/07/05/how-i-survived-the-great-kubernetes-exodus-migrating-eks-cluster-from-v1-26-to-v1-33-on-aws/>

VANCOUVER

Ukeme David Eseme | Sciencx - » How I Survived the Great Kubernetes Exodus: Migrating EKS Cluster from v1.26 to v1.33 on AWS. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/07/05/how-i-survived-the-great-kubernetes-exodus-migrating-eks-cluster-from-v1-26-to-v1-33-on-aws/

CHICAGO

" » How I Survived the Great Kubernetes Exodus: Migrating EKS Cluster from v1.26 to v1.33 on AWS." Ukeme David Eseme | Sciencx - Accessed . https://www.scien.cx/2025/07/05/how-i-survived-the-great-kubernetes-exodus-migrating-eks-cluster-from-v1-26-to-v1-33-on-aws/

IEEE

" » How I Survived the Great Kubernetes Exodus: Migrating EKS Cluster from v1.26 to v1.33 on AWS." Ukeme David Eseme | Sciencx [Online]. Available: https://www.scien.cx/2025/07/05/how-i-survived-the-great-kubernetes-exodus-migrating-eks-cluster-from-v1-26-to-v1-33-on-aws/. [Accessed: ]

rf:citation

» How I Survived the Great Kubernetes Exodus: Migrating EKS Cluster from v1.26 to v1.33 on AWS | Ukeme David Eseme | Sciencx | https://www.scien.cx/2025/07/05/how-i-survived-the-great-kubernetes-exodus-migrating-eks-cluster-from-v1-26-to-v1-33-on-aws/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.

Introduction: The Challenge Ahead

Chapter 1: The Reconnaissance Phase

Mapping the Battlefield

💀 The Aging Cluster

🧪 First Attempt: The “By-the-Book” Upgrade

🛠️ Second Attempt: The Manifest Extraction Strategy

🚨 Reality Check: The Spaghetti Hit the Fan

⚙️ Third Attempt: Enter Velero

🧠 Fourth Attempt: Organized Manifest Extraction — The Breakthrough

📦 The CRD Explosion

🔍 Custom Resources Inventory

✅ API Compatibility Victory

📦 Chapter 2: The Data Dilemma

💡 Choosing Our Weapon: Manual EBS Snapshots

🛠️ Automation to the Rescue

🔐 Critical Volumes Backed Up

🏗️ Chapter 3: Building the New Kingdom

⚙️ The Foundation: CRD Installation Order Matters

🧬 Data Resurrection: Bringing Back Our State

🎭 Chapter 4: The Application Deployment Dance

The Dependency Choreography

🪜 Step-by-Step Deployment Strategy

🔐 Chapter 5: The Great SSL Certificate Saga

The Mystery of the Expired Certificates

🔎 The Missing Link: ClusterIssuer

🛠️ The Quick Fix

⚠️ Chapter 6: The AWS Load Balancer Controller Nightmare

🧩 The IAM Permission Maze

🔐 The Fix: A Proper IAM Role and Trust Policy

🌐 The Internal vs Internet-Facing Revelation

🛠️ The Fix: Force Internet-Facing Scheme

🌐 Chapter 7: The DNS Migration Challenge

The Automated Solution

⚙️ The Automated Solution

🏁 Chapter 8: The Final Victory

⚔️ The Moment of Truth

📘 Chapter 9: Lessons Learned

🔧 Technical Insights

🧠 Operational Insights

🛠️ Custom Tools That Saved Us

📊 Chapter 10: The Final Status

Performance Metrics

Conclusion: The Journey's End

What's Next?

Final Words

Related Posts