Navigating AWS EKS with Terraform: Configuring Karpenter for Just-in-Time Node Provisioning

In this article, we will integrate Karpenter, which will enable our cluster to provision nodes dynamically based on actual workload requirements. Unlike Cluster Autoscaler which scales existing node groups, Karpenter provisions exactly the right-sized …


This content originally appeared on DEV Community and was authored by Oluwafemi Lawal

In this article, we will integrate Karpenter, which will enable our cluster to provision nodes dynamically based on actual workload requirements. Unlike Cluster Autoscaler which scales existing node groups, Karpenter provisions exactly the right-sized nodes on demand. We will need to make some adjustments to our EKS module.

Prerequisites

Ensure you have:

  • An active AWS account
  • Terraform installed and configured
  • Helm installed
  • kubectl

Setup Overview

In the previous part of the series, we installed and configured the Kubernetes Cluster Autoscaler (CA) on an EKS cluster provisioned with Terraform. That article showed how to let Kubernetes resize existing node groups when Pods can't be scheduled.

Karpenter takes a different approach; it's an open-source autoscaling solution from AWS that goes beyond resizing existing node groups. Instead, Karpenter talks directly to AWS and launches the right EC2 instances on demand to fit the Pods that are actually pending.

Key Architectural Difference: Self-Managed Nodes

Important: Nodes provisioned by Karpenter will appear as "Self-managed" in the AWS EKS console, while Cluster Autoscaler nodes show as part of "Managed Node Groups". This is by design:

  • Cluster Autoscaler: Works within existing Auto Scaling Groups (ASGs) or EKS Managed Node Groups. It scales by adjusting the desired capacity of predefined node groups.
  • Karpenter: Provisions nodes as individual EC2 instances directly using the AWS API, bypassing ASGs and node groups entirely. This gives Karpenter the flexibility to dynamically select instance types, launch nodes faster, and optimize costs.

Both approaches result in fully functional nodes in your cluster - the "self-managed" designation simply indicates that Karpenter manages the instance lifecycle independently. This architecture enables Karpenter's key advantages: faster provisioning (30-60 seconds vs 2-5 minutes), dynamic instance type selection, and more efficient resource utilization.

For more details, see:

Prepare Your EKS Cluster

Ensure your EKS cluster is ready. If needed, refer back to our guide on setting up an EKS cluster using Terraform

Modifications

To support Karpenter's auto-discovery feature, we need to update the tags on our EKS resources. Unlike Cluster Autoscaler which used k8s.io/cluster-autoscaler/* tags, Karpenter uses karpenter.sh/discovery tags to identify the subnets and security groups it can use when provisioning nodes.

Update EKS Module to Support New Parameters

First, we need to extend our EKS module to support the parameters required for Karpenter. Add these variables to modules/aws/eks/v1/variables.tf:

variable "eks_cluster_version" {
  description = "Kubernetes version for the EKS cluster"
  type        = string
  default     = null
}

variable "additional_role_mappings" {
  description = "Additional IAM role mappings for aws-auth ConfigMap"
  type = list(object({
    rolearn  = string
    username = string
    groups   = list(string)
  }))
  default = []
}

variable "additional_launch_template_tags" {
  description = "Additional tags to add to the launch template"
  type        = map(string)
  default     = {}
}

Update modules/aws/eks/v1/main.tf to use these variables:

  1. Add version to the EKS cluster:
resource "aws_eks_cluster" "main" {
  name                      = var.cluster_name
  role_arn                  = aws_iam_role.eks_cluster_role.arn
  enabled_cluster_log_types = var.enabled_cluster_log_types
  version                   = var.eks_cluster_version  # Add this line
  # ... rest of configuration
}
  1. Update the launch template tags to merge additional tags:
resource "aws_launch_template" "eks_node_group" {
  # ... existing configuration ...

  tags = merge(
    {
      "Name"                                      = "${var.cluster_name}-eks-node-group"
      "kubernetes.io/cluster/${var.cluster_name}" = "owned"
    },
    var.additional_launch_template_tags
  )
}
  1. Update the aws-auth ConfigMap to include additional role mappings:
resource "kubernetes_config_map" "aws_auth" {
  metadata {
    name      = "aws-auth"
    namespace = "kube-system"
  }

  data = {
    mapRoles = yamlencode(concat(
      [
        {
          rolearn  = aws_iam_role.eks_admins_role.arn
          username = aws_iam_role.eks_admins_role.name
          groups   = ["system:masters"]
        },
        {
          rolearn  = aws_iam_role.node_role.arn
          username = "system:node:{{EC2PrivateDNSName}}"
          groups   = ["system:bootstrappers", "system:nodes"]
        }
      ],
      var.additional_role_mappings
    ))
    # ... rest of configuration
  }
}
  1. Update the nodes security group to include additional tags:
resource "aws_security_group" "eks_nodes_sg" {
  name        = "${var.cluster_name}-eks-nodes-sg"
  description = "Security group for all nodes in the cluster"
  vpc_id      = var.vpc_id

  tags = merge(
    {
      Name                                        = "${var.cluster_name}-eks-nodes-sg"
      "kubernetes.io/cluster/${var.cluster_name}" = "owned"
    },
    var.additional_launch_template_tags
  )
}
  1. Add new outputs to modules/aws/eks/v1/outputs.tf:
output "oidc_provider_url" {
  description = "The OIDC URL of the Cluster"
  value       = aws_iam_openid_connect_provider.eks.url
}

output "node_instance_role_arn" {
  description = "The ARN of the node instance role"
  value       = aws_iam_role.node_role.arn
}

Update VPC and EKS Configuration

Critical: Karpenter needs to discover subnets and security groups using tags. You must add the karpenter.sh/discovery tag to private subnets only (not public subnets) and to the node security group.

Why private subnets only? Karpenter-provisioned nodes need outbound internet access to download container images and communicate with the EKS control plane. Private subnets with NAT gateway provide this access. If you tag public subnets, Karpenter might launch nodes there without public IPs, causing them to fail to join the cluster.

Update vpc.tf to add Karpenter discovery tags:

module "vpc" {
  source = "./modules/aws/vpc/v1"

  vpc_name             = "${var.cluster_name}-vpc"
  cidr_block           = "10.0.0.0/16"
  nat_gateway          = true
  enable_dns_support   = true
  enable_dns_hostnames = true

  public_subnet_count  = 3
  private_subnet_count = 3
  public_subnet_tags = {
    "kubernetes.io/cluster/${var.cluster_name}" = "shared"
    "kubernetes.io/role/elb"                    = "1"
    # DO NOT add karpenter.sh/discovery tag to public subnets
  }

  private_subnet_tags = {
    "kubernetes.io/cluster/${var.cluster_name}" = "shared"
    "kubernetes.io/role/internal-elb"           = "1"
    "karpenter.sh/discovery"                    = var.cluster_name  # Only tag private subnets
  }
}

Now update eks.tf to configure the EKS cluster with Karpenter support:

module "eks" {
  source = "./modules/aws/eks/v1"

  region              = var.region
  cluster_name        = var.cluster_name
  private_subnets     = module.vpc.private_subnets
  public_subnets      = module.vpc.public_subnets
  vpc_id              = module.vpc.vpc_id
  eks_cluster_version = "1.31"

  managed_node_groups = {
    demo_group = {
      name           = "demo-node-group"
      desired_size   = 2
      min_size       = 1
      max_size       = 3
      instance_types = ["t3a.small"]
    }
  }

  additional_role_mappings = [
    {
      rolearn  = aws_iam_role.karpenter_controller_role.arn
      username = "system:serviceaccount:karpenter:karpenter"
      groups   = ["system:masters"]
    }
  ]

  additional_launch_template_tags = {
    "karpenter.sh/discovery" = var.cluster_name
  }
}

Note on Hybrid Architecture: This configuration uses a best practice approach where:

  • The managed node group (2 small nodes) hosts critical system components (CoreDNS, kube-proxy, etc.)
  • Karpenter dynamically provisions additional nodes for application workloads

This hybrid model provides stability for system pods while giving you Karpenter's flexibility and cost optimization for your applications.

Create IAM Role for Karpenter

The Karpenter controller needs specific permissions to interact with EC2 instances and AWS services. We'll create an IAM role that uses IRSA (IAM Roles for Service Accounts) to allow the Karpenter pods to assume this role securely.

Create a new file called karpenter.tf at the root of your repository:

############################################################################################################
### KARPENTER
############################################################################################################

# IAM Role for Karpenter Controller
data "aws_iam_policy_document" "karpenter_assume_role_policy" {
  statement {
    actions = ["sts:AssumeRoleWithWebIdentity"]
    effect  = "Allow"

    condition {
      test     = "StringEquals"
      variable = "${replace(module.eks.oidc_provider_url, "https://", "")}:sub"
      values   = ["system:serviceaccount:karpenter:karpenter"]
    }

    principals {
      identifiers = [module.eks.oidc_provider_arn]
      type        = "Federated"
    }
  }
}

resource "aws_iam_role" "karpenter_controller_role" {
  name               = "${var.cluster_name}-karpenter-controller"
  assume_role_policy = data.aws_iam_policy_document.karpenter_assume_role_policy.json
}

# Karpenter Controller Policy
data "aws_iam_policy_document" "karpenter_controller_policy" {
  statement {
    sid    = "AllowScopedEC2InstanceAccessActions"
    effect = "Allow"
    actions = [
      "ec2:RunInstances",
      "ec2:CreateFleet"
    ]
    resources = [
      "arn:aws:ec2:${var.region}::image/*",
      "arn:aws:ec2:${var.region}::snapshot/*",
      "arn:aws:ec2:${var.region}:*:security-group/*",
      "arn:aws:ec2:${var.region}:*:subnet/*",
      "arn:aws:ec2:${var.region}:*:launch-template/*"
    ]
  }

  statement {
    sid    = "AllowScopedEC2InstanceActionsWithTags"
    effect = "Allow"
    actions = [
      "ec2:RunInstances",
      "ec2:CreateFleet",
      "ec2:CreateLaunchTemplate"
    ]
    resources = [
      "arn:aws:ec2:${var.region}:*:fleet/*",
      "arn:aws:ec2:${var.region}:*:instance/*",
      "arn:aws:ec2:${var.region}:*:volume/*",
      "arn:aws:ec2:${var.region}:*:network-interface/*",
      "arn:aws:ec2:${var.region}:*:launch-template/*",
      "arn:aws:ec2:${var.region}:*:spot-instances-request/*"
    ]
  }

  statement {
    sid    = "AllowScopedResourceCreationTagging"
    effect = "Allow"
    actions = [
      "ec2:CreateTags"
    ]
    resources = [
      "arn:aws:ec2:${var.region}:*:fleet/*",
      "arn:aws:ec2:${var.region}:*:instance/*",
      "arn:aws:ec2:${var.region}:*:volume/*",
      "arn:aws:ec2:${var.region}:*:network-interface/*",
      "arn:aws:ec2:${var.region}:*:launch-template/*",
      "arn:aws:ec2:${var.region}:*:spot-instances-request/*"
    ]
  }

  statement {
    sid    = "AllowScopedResourceTagging"
    effect = "Allow"
    actions = [
      "ec2:CreateTags"
    ]
    resources = [
      "arn:aws:ec2:${var.region}:*:instance/*"
    ]
  }

  statement {
    sid    = "AllowScopedDeletion"
    effect = "Allow"
    actions = [
      "ec2:TerminateInstances",
      "ec2:DeleteLaunchTemplate"
    ]
    resources = [
      "arn:aws:ec2:${var.region}:*:instance/*",
      "arn:aws:ec2:${var.region}:*:launch-template/*"
    ]
  }

  statement {
    sid    = "AllowRegionalReadActions"
    effect = "Allow"
    actions = [
      "ec2:DescribeAvailabilityZones",
      "ec2:DescribeImages",
      "ec2:DescribeInstances",
      "ec2:DescribeInstanceTypeOfferings",
      "ec2:DescribeInstanceTypes",
      "ec2:DescribeLaunchTemplates",
      "ec2:DescribeSecurityGroups",
      "ec2:DescribeSpotPriceHistory",
      "ec2:DescribeSubnets"
    ]
    resources = ["*"]
  }

  statement {
    sid    = "AllowSSMReadActions"
    effect = "Allow"
    actions = [
      "ssm:GetParameter"
    ]
    resources = [
      "arn:aws:ssm:${var.region}::parameter/aws/service/*"
    ]
  }

  statement {
    sid    = "AllowPricingReadActions"
    effect = "Allow"
    actions = [
      "pricing:GetProducts"
    ]
    resources = ["*"]
  }

  statement {
    sid    = "AllowInterruptionQueueActions"
    effect = "Allow"
    actions = [
      "sqs:DeleteMessage",
      "sqs:GetQueueUrl",
      "sqs:GetQueueAttributes",
      "sqs:ReceiveMessage"
    ]
    resources = ["arn:aws:sqs:${var.region}:*:${var.cluster_name}"]
  }

  statement {
    sid    = "AllowPassNodeIAMRole"
    effect = "Allow"
    actions = [
      "iam:PassRole"
    ]
    resources = [
      module.eks.node_instance_role_arn
    ]
  }

  statement {
    sid    = "AllowScopedInstanceProfileCreationActions"
    effect = "Allow"
    actions = [
      "iam:CreateInstanceProfile"
    ]
    resources = [
      "arn:aws:iam::*:instance-profile/*"
    ]
  }

  statement {
    sid    = "AllowScopedInstanceProfileTagActions"
    effect = "Allow"
    actions = [
      "iam:TagInstanceProfile"
    ]
    resources = [
      "arn:aws:iam::*:instance-profile/*"
    ]
  }

  statement {
    sid    = "AllowScopedInstanceProfileActions"
    effect = "Allow"
    actions = [
      "iam:AddRoleToInstanceProfile",
      "iam:RemoveRoleFromInstanceProfile",
      "iam:DeleteInstanceProfile"
    ]
    resources = [
      "arn:aws:iam::*:instance-profile/*"
    ]
  }

  statement {
    sid    = "AllowInstanceProfileReadActions"
    effect = "Allow"
    actions = [
      "iam:GetInstanceProfile"
    ]
    resources = [
      "arn:aws:iam::*:instance-profile/*"
    ]
  }

  statement {
    sid    = "AllowAPIServerEndpointDiscovery"
    effect = "Allow"
    actions = [
      "eks:DescribeCluster"
    ]
    resources = [
      "arn:aws:eks:${var.region}:*:cluster/${var.cluster_name}"
    ]
  }
}

resource "aws_iam_role_policy" "karpenter_controller_policy" {
  name   = "${var.cluster_name}-karpenter-controller-policy"
  role   = aws_iam_role.karpenter_controller_role.id
  policy = data.aws_iam_policy_document.karpenter_controller_policy.json
}

The Karpenter controller policy gives Karpenter the permissions it needs to:

  • Launch and terminate EC2 instances
  • Create and manage launch templates
  • Describe EC2 resources (subnets, security groups, instance types, etc.)
  • Manage instance profiles
  • Tag resources appropriately

We also pass the role mapping to the config map in the EKS module (shown in the eks.tf above), which allows the Karpenter service account to assume this IAM role.

We also need to update our Terraform outputs in ouput.tf to expose values needed by our deployment scripts:

output "karpenter_controller_role_arn" {
  description = "The ARN of the Karpenter controller IAM role"
  value       = aws_iam_role.karpenter_controller_role.arn
}

output "aws_region" {
  description = "The AWS region where the EKS cluster is deployed"
  value       = var.region
}

output "cluster_name" {
  description = "The name of the EKS cluster"
  value       = var.cluster_name
}

output "cluster_endpoint" {
  description = "The endpoint for the EKS cluster"
  value       = module.eks.cluster_endpoint
}

output "node_instance_role_arn" {
  description = "The ARN of the node instance role"
  value       = module.eks.node_instance_role_arn
}

We can then apply our changes with terraform:

terraform apply

Terraform Apply Output

We can confirm our cluster is up and running in the AWS console.

Cluster in AWS Console

Install Karpenter using Helm

We have created the role mapping that the karpenter service account will use, but we still need to create the actual service account. The manifest to create that will look like this:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: karpenter
  namespace: karpenter
  annotations:
    eks.amazonaws.com/role-arn: "arn:aws:iam::<ACCOUNT_ID>:role/<IAM_ROLE_NAME>"

For simplicity (or just because I'm lazy), I have created scripts to help with the rest of the article. They make use of terraform outputs to avoid having to put placeholders or keep track of which command to run first.

The script scripts/deploy_karpenter.sh updates our kubectl context to that of the cluster, creates the service account for Karpenter with the IRSA annotation, and deploys Karpenter using helm:

#!/bin/bash
set -euxo pipefail

# Retrieve Terraform outputs
REGION=$(terraform output -raw aws_region)
CLUSTER_NAME=$(terraform output -raw cluster_name)
KARPENTER_ROLE_ARN=$(terraform output -raw karpenter_controller_role_arn)
CLUSTER_ENDPOINT=$(terraform output -raw cluster_endpoint)
NODE_INSTANCE_ROLE_ARN=$(terraform output -raw node_instance_role_arn)

# Extract the role name from the ARN
NODE_INSTANCE_ROLE_NAME=$(echo $NODE_INSTANCE_ROLE_ARN | awk -F'/' '{print $NF}')

# Update kubeconfig
aws eks update-kubeconfig --region $REGION --name $CLUSTER_NAME

# Create the karpenter namespace
kubectl create namespace karpenter || true

# Create the service account for Karpenter
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: karpenter
  namespace: karpenter
  annotations:
    eks.amazonaws.com/role-arn: $KARPENTER_ROLE_ARN
EOF

# Install Karpenter using Helm
export KARPENTER_VERSION="0.16.3"

helm repo add karpenter https://charts.karpenter.sh/ || true
helm repo update

helm upgrade --install karpenter karpenter/karpenter \
  --namespace karpenter \
  --create-namespace \
  --version ${KARPENTER_VERSION} \
  --set serviceAccount.create=false \
  --set serviceAccount.name=karpenter \
  --set clusterName=${CLUSTER_NAME} \
  --set clusterEndpoint=${CLUSTER_ENDPOINT} \
  --set aws.defaultInstanceProfile=${NODE_INSTANCE_ROLE_NAME} \
  --set controller.resources.requests.cpu=500m \
  --set controller.resources.requests.memory=512Mi \
  --set controller.resources.limits.cpu=1 \
  --set controller.resources.limits.memory=1Gi \
  --set replicas=1 \
  --wait || true

# Wait for Karpenter to be ready
kubectl wait --for=condition=available --timeout=300s deployment/karpenter -n karpenter || true

# Create EC2NodeClass
cat <<EOF | kubectl apply -f -
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
  name: default
spec:
  amiFamily: AL2
  instanceProfile: ${NODE_INSTANCE_ROLE_NAME}
  subnetSelector:
    karpenter.sh/discovery: ${CLUSTER_NAME}
  securityGroupSelector:
    karpenter.sh/discovery: ${CLUSTER_NAME}
  tags:
    karpenter.sh/discovery: ${CLUSTER_NAME}
EOF

# Create NodePool
kubectl apply -f kubernetes/karpenter/nodepool.yaml

# Deploy test deployment (starts at 0 replicas)
kubectl apply -f kubernetes/karpenter/test-deployment.yaml

echo "Karpenter installation completed successfully!"
echo ""
echo "To verify Karpenter is running:"
echo "kubectl get pods -n karpenter"
echo ""
echo "To view Karpenter logs:"
echo "kubectl logs -f -n karpenter -l app.kubernetes.io/name=karpenter"
echo ""
echo "To test autoscaling, run:"
echo "./scripts/test_karpenter.sh"

The script also creates the Karpenter AWSNodeTemplate (which tells Karpenter how to configure EC2 instances) and applies the Provisioner configuration from kubernetes/karpenter/nodepool.yaml:

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: default
spec:
  requirements:
    - key: kubernetes.io/arch
      operator: In
      values: ["amd64"]
    - key: kubernetes.io/os
      operator: In
      values: ["linux"]
    - key: karpenter.sh/capacity-type
      operator: In
      values: ["on-demand"]
    - key: karpenter.k8s.aws/instance-category
      operator: In
      values: ["t"]
    - key: karpenter.k8s.aws/instance-generation
      operator: Gt
      values: ["2"]
  providerRef:
    name: default
  limits:
    resources:
      cpu: 10
  ttlSecondsAfterEmpty: 30
  ttlSecondsUntilExpired: 2592000 # 30 days

This Provisioner configuration:

  • Only provisions amd64 Linux nodes
  • Uses on-demand instances (you can add "spot" for cost savings)
  • Restricts to t-family instances (t3, t3a, etc.) generation 3 or higher
  • Sets a CPU limit of 10 vCPUs total across all Karpenter-provisioned nodes
  • Automatically deprovisions nodes 30 seconds after they become empty
  • Expires nodes after 30 days for security patching

Run the deployment script:

chmod +x scripts/deploy_karpenter.sh
./scripts/deploy_karpenter.sh

Karpenter Deployment Script Output

Verify Karpenter is running:

kubectl get pods -n karpenter

Karpenter Pod Running

You will notice that we only have two nodes at the moment.

Initial Node Count

Deploy Applications To Trigger Scale Up Operation

Next, we will deploy a test workload to the cluster. The test deployment requests resources that our two small EC2 instances cannot fully accommodate. Karpenter will come to the rescue and provision an additional node to make our cluster healthy again.

The test deployment kubernetes/karpenter/test-deployment.yaml looks like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
  namespace: default
spec:
  replicas: 0
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      terminationGracePeriodSeconds: 0
      containers:
        - name: inflate
          image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
          resources:
            requests:
              cpu: "1"
              memory: "1.5Gi"

The script scripts/test_karpenter.sh scales this deployment from 0 to 5 replicas:

#!/bin/bash
set -euxo pipefail

echo "Scaling up the inflate deployment to trigger Karpenter provisioning..."

# Scale up the deployment to trigger node provisioning
kubectl scale deployment inflate --replicas=5 -n default

echo ""
echo "Deployment scaled to 5 replicas. Monitoring cluster events..."

Run the test script:

chmod +x scripts/test_karpenter.sh
./scripts/test_karpenter.sh

test_karpenter output

Watch Karpenter logs to see it making provisioning decisions:

kubectl logs -f -n karpenter -l app.kubernetes.io/name=karpenter -c controller

You'll see logs like:

INFO    controller.provisioner  found provisionable pod(s)
INFO    controller.provisioner  computed new node(s) to fit pod(s)
INFO    controller.provisioner  created node with 1 pods requesting cpu=1000m, memory=1536Mi

Karpenter Scale up evidence

You will notice that Karpenter did its job and added an extra node to our cluster. The deployments were able to succeed.

Important Note: In the AWS EKS console, you'll see the new node listed as "Self-managed" under the Compute tab, while your original 2 nodes show as part of the "Managed Node Group". This is expected! Karpenter provisions nodes as individual EC2 instances directly (not through node groups), which is why they appear as self-managed. You can verify these are Karpenter nodes by checking their labels:

kubectl get nodes --show-labels | grep karpenter

You'll see labels like karpenter.sh/capacity-type=on-demand, karpenter.k8s.aws/instance-category=t, etc.

nodes with labels

Annotations in console

AWS Console View After Scale-Up

If you check the events of your cluster, you will be able to find the scale up events.

Scale up events

Scale Down Operation

Now we will scale down the test deployment and watch to see if Karpenter automatically removes the node.

The cleanup script scripts/cleanup_karpenter_test.sh scales the deployment back to 0:

#!/bin/bash
set -euxo pipefail

echo "Scaling down the inflate deployment..."

# Scale down the deployment
kubectl scale deployment inflate --replicas=0 -n default

echo ""
echo "Deployment scaled to 0 replicas."
echo "Watch nodes with: kubectl get nodes -w"
echo "Karpenter should remove the empty node in about 30 seconds (ttlSecondsAfterEmpty)"

Run the cleanup script:

chmod +x scripts/cleanup_karpenter_test.sh
./scripts/cleanup_karpenter_test.sh

Monitor the nodes:

kubectl get nodes -w

After about 30 seconds (the ttlSecondsAfterEmpty setting in our Provisioner), Karpenter will:

  1. Cordon the empty node
  2. Drain any remaining system pods
  3. Terminate the EC2 instance

You'll see this in the Karpenter logs:

INFO    controller.deprovisioning   deprovisioning via emptiness delete
INFO    controller.deprovisioning   cordoned node
INFO    controller.deprovisioning   deleted node

Running ./scripts/delete_all.sh

Karpenter scales down the cluster back to two nodes much faster than Cluster Autoscaler (30-60 seconds vs 10+ minutes).

Understanding Karpenter's Advantages

Through this exercise, you've seen several advantages of Karpenter over Cluster Autoscaler:

Faster Provisioning

Karpenter provisions nodes in 30-60 seconds because it launches instances directly rather than scaling Auto Scaling Groups.

No Pre-defined Node Groups

You don't need to define multiple node groups for different instance types. Karpenter chooses the best fit automatically based on pending pod requirements.

Better Bin Packing

Karpenter looks at all pending pods together and provisions nodes that maximize utilization.

Faster Scale Down

Karpenter can deprovision nodes in 30-60 seconds vs 10+ minutes for Cluster Autoscaler.

Consolidation

Karpenter actively consolidates nodes to reduce costs, even moving pods to smaller instances when possible.

Spot Instance Support

Adding spot instances is as simple as adding "spot" to the karpenter.sh/capacity-type values in your Provisioner.

Comparison: Cluster Autoscaler vs Karpenter

Feature Cluster Autoscaler Karpenter
Provisioning Time 2-5 minutes 30-60 seconds
Node Management Auto Scaling Groups / Managed Node Groups Individual EC2 instances (self-managed)
Instance Selection Predefined in node group Dynamic based on pod requirements
Scale Down Time 10+ minutes 30-60 seconds
Spot Support Via ASG configuration Native, simple toggle
Bin Packing Limited (per node group) Advanced (across all options)
Consolidation No Yes (active rebalancing)
Configuration Via ASG/node group settings Via Kubernetes CRDs
Console Display Shows as "Managed Node Group" Shows as "Self-managed"

The complete code used in this article can be found on this branch of the repository

Troubleshooting Common Issues

Issue: Nodes Stuck in NotReady State

Symptom: Karpenter provisions a node, but it never becomes Ready. Kubelet never starts.

Cause: The node was likely launched in a public subnet without a public IP, preventing it from reaching the EKS API server.

Solution: Ensure only private subnets are tagged with karpenter.sh/discovery. Remove this tag from public subnets:

public_subnet_tags = {
  "kubernetes.io/cluster/${var.cluster_name}" = "shared"
  "kubernetes.io/role/elb"                    = "1"
  # DO NOT add karpenter.sh/discovery tag here
}

private_subnet_tags = {
  "kubernetes.io/cluster/${var.cluster_name}" = "shared"
  "kubernetes.io/role/internal-elb"           = "1"
  "karpenter.sh/discovery"                    = var.cluster_name  # Only here
}

After fixing the tags, run terraform apply, then delete any stuck nodes and let Karpenter recreate them.

Issue: Capacity-Block Warnings in Logs

Symptom: Karpenter logs show errors like:

ERROR   controller      Reconciler error    {"commit": "...", "reconciler": "capacity-block", ...}

Cause: This is a harmless warning in Karpenter v0.16.3. The controller tries to query newer GPU instance types (P5, P6, Trn2) that didn't exist when this version was released.

Impact: None - these warnings don't affect Karpenter's functionality for standard instance types.

Solution: Either ignore the warnings or filter them out:

kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter -f | grep -v "capacity-block"

To permanently suppress, you could upgrade to a newer Karpenter version, but test thoroughly as newer versions may introduce breaking changes.

Issue: Why Are My Nodes "Self-Managed" in AWS Console?

Symptom: New nodes appear as "Self-managed" in the EKS console instead of being part of a managed node group.

Explanation: This is expected behavior for Karpenter. Unlike Cluster Autoscaler which scales existing node groups, Karpenter provisions individual EC2 instances directly using the AWS API. These appear as "self-managed" because they're not part of an EKS Managed Node Group or Auto Scaling Group.

Verification: Check that the nodes have Karpenter labels:

kubectl get nodes --show-labels | grep karpenter

You should see labels like karpenter.sh/capacity-type=on-demand, karpenter.k8s.aws/instance-category=t, etc.

Reference: See Karpenter documentation for architectural details.

Conclusion

Your EKS cluster now has a functioning Karpenter setup that is adept at provisioning node resources in response to workload changes. Karpenter provides faster scaling, smarter instance selection, and better cost optimization compared to traditional Cluster Autoscaler.

Don't forget to clean up any resources you are not using, it can get quite expensive.

To clean up Karpenter resources:

chmod +x scripts/delete_all.sh
./scripts/delete_all.sh

If you want to destroy all infrastructure:

terraform destroy

Terraform destroy output

Important: Karpenter-provisioned nodes should be cleaned up automatically when you delete the Provisioner, but you may want to verify in the AWS console that no orphaned instances remain.

For more advanced configurations and best practices, refer to:


This content originally appeared on DEV Community and was authored by Oluwafemi Lawal


Print Share Comment Cite Upload Translate Updates
APA

Oluwafemi Lawal | Sciencx (2025-11-20T23:08:41+00:00) Navigating AWS EKS with Terraform: Configuring Karpenter for Just-in-Time Node Provisioning. Retrieved from https://www.scien.cx/2025/11/20/navigating-aws-eks-with-terraform-configuring-karpenter-for-just-in-time-node-provisioning-2/

MLA
" » Navigating AWS EKS with Terraform: Configuring Karpenter for Just-in-Time Node Provisioning." Oluwafemi Lawal | Sciencx - Thursday November 20, 2025, https://www.scien.cx/2025/11/20/navigating-aws-eks-with-terraform-configuring-karpenter-for-just-in-time-node-provisioning-2/
HARVARD
Oluwafemi Lawal | Sciencx Thursday November 20, 2025 » Navigating AWS EKS with Terraform: Configuring Karpenter for Just-in-Time Node Provisioning., viewed ,<https://www.scien.cx/2025/11/20/navigating-aws-eks-with-terraform-configuring-karpenter-for-just-in-time-node-provisioning-2/>
VANCOUVER
Oluwafemi Lawal | Sciencx - » Navigating AWS EKS with Terraform: Configuring Karpenter for Just-in-Time Node Provisioning. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/11/20/navigating-aws-eks-with-terraform-configuring-karpenter-for-just-in-time-node-provisioning-2/
CHICAGO
" » Navigating AWS EKS with Terraform: Configuring Karpenter for Just-in-Time Node Provisioning." Oluwafemi Lawal | Sciencx - Accessed . https://www.scien.cx/2025/11/20/navigating-aws-eks-with-terraform-configuring-karpenter-for-just-in-time-node-provisioning-2/
IEEE
" » Navigating AWS EKS with Terraform: Configuring Karpenter for Just-in-Time Node Provisioning." Oluwafemi Lawal | Sciencx [Online]. Available: https://www.scien.cx/2025/11/20/navigating-aws-eks-with-terraform-configuring-karpenter-for-just-in-time-node-provisioning-2/. [Accessed: ]
rf:citation
» Navigating AWS EKS with Terraform: Configuring Karpenter for Just-in-Time Node Provisioning | Oluwafemi Lawal | Sciencx | https://www.scien.cx/2025/11/20/navigating-aws-eks-with-terraform-configuring-karpenter-for-just-in-time-node-provisioning-2/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.