EKS Pod Limits: When Your Node Just Can't Fit One More Pod

The Symptom

After scaling up our mypie-infra ArgoCD ApplicationSet, several infra add-ons were deployed: cert-manager, LBC, metrics-server, Atlantis. At the same time, ArgoCD was running its own 7-pod control plane. Everything scheduled fine — until a restart of the argocd-repo-server pod triggered a rolling update.

The new pod couldn’t be scheduled:

Events:
  Warning  FailedScheduling  14m  default-scheduler
    0/1 nodes are available: 1 Too many pods.
    preemption: 0/1 nodes are available: 1 No preemption victims found.

$ kubectl get nodes
NAME                                          STATUS   ROLES    AGE    VERSION
ip-10-1-11-55.eu-central-1.compute.internal   Ready    <none>   95m    v1.32.12-eks-f69f56f

One node, 17 pods, node at capacity, one pod can’t schedule.

Why Does t3.medium Cap Out at 17 Pods?

AWS EKS uses the VPC CNI plugin (aws-node), which assigns real VPC IP addresses to each pod. The number of IPs available on a node is determined by the instance type’s ENI (Elastic Network Interface) limits:

max pods = (number of ENIs) × (IPs per ENI - 1) + 2

For t3.medium:

Attribute	Value
Max ENIs	3
Max IPs per ENI	6

max pods = 3 × (6 - 1) + 2 = 3 × 5 + 2 = 17

This is enforced by the kubelet — Kubernetes itself won’t schedule more pods than the node declares it can hold.

You can verify the limit:

kubectl get node <node-name> -o jsonpath='{.status.allocatable.pods}'
# 17

What Pods Were Taking Up All 17 Slots?

$ kubectl get pods -A --field-selector spec.nodeName=ip-10-1-11-55... \
    -o custom-columns='NAMESPACE:.metadata.namespace,NAME:.metadata.name' \
  | sort

Decomposed:

Component	Pod Count
`kube-system`: aws-node, kube-proxy, coredns ×2, ebs-csi ×3, metrics-server, LBC ×2	~11
`argocd`: application-controller, applicationset-controller, dex, redis, server, notifications	6
Total	17

The argocd-repo-server pod — the one that needed to restart — would have been pod #18.

Option 1: Add More Nodes (Quickest)

Scale the node group via AWS CLI:

aws eks update-nodegroup-config \
  --cluster-name mypie-eks-staging \
  --nodegroup-name mypie-eks-staging-general \
  --scaling-config minSize=1,maxSize=3,desiredSize=2 \
  --region eu-central-1

The second node joined in ~3 minutes. The pending pod scheduled immediately.

Update Terraform to match (so the next apply doesn’t reset desired to 1):

resource "aws_eks_node_group" "general" {
  scaling_config {
    desired_size = var.environment == "production" ? 3 : 2   # was 1
    min_size     = var.environment == "production" ? 3 : 1
    max_size     = var.environment == "production" ? 10 : 3
  }

  # Prevent Terraform from overriding autoscaler-managed desired count
  lifecycle {
    ignore_changes = [scaling_config[0].desired_size]
  }
}

Option 2: Enable VPC CNI Prefix Delegation (More Pods Per Node)

If you want to keep a single t3.medium but fit more pods, enable prefix delegation on the VPC CNI. This assigns /28 CIDR prefixes to ENI slots instead of individual IPs, multiplying capacity by 16:

new max = (ENIs) × (IPs per ENI - 1) × 16 + 2
t3.medium: 3 × 5 × 16 + 2 = 242 pods (EKS caps at 110)

Enable it on the EKS add-on:

aws eks update-addon \
  --cluster-name mypie-eks-staging \
  --addon-name vpc-cni \
  --configuration-values '{"env":{"ENABLE_PREFIX_DELEGATION":"true","WARM_PREFIX_TARGET":"1"}}' \
  --region eu-central-1

Important: After enabling prefix delegation, existing nodes need to be recycled for the new limits to take effect. The CNI calculates available IPs on startup. Rolling-restart the aws-node daemonset and cordon/drain/replace nodes, or simply scale-in and scale-out the node group.

You also need to update the kubelet’s --max-pods setting — for managed node groups, set the EKS max-pods value via a launch template with a custom bootstrap script, or use the EKS eks-max-pods parameter:

# Get the recommended max-pods for t3.medium with prefix delegation
aws ec2 describe-instance-types \
  --instance-types t3.medium \
  --query 'InstanceTypes[0].NetworkInfo' \
  --output table

Or use the EKS max pods calculator.

Option 3: Use a Larger Instance Type

The simplest long-term option: just use a bigger instance. t3.large gives you 35 pods, t3.xlarge gives 58, m5.large gives 29.

For staging environments, t3.large is usually sufficient and cost-effective enough.

resource "aws_eks_node_group" "general" {
  instance_types = ["t3.large"]   # was "t3.medium"
}

Note: changing instance_types in an EKS managed node group replaces the node group (forces a new node group creation and old one deletion). Plan for node drain.

Which Option We Chose

For our staging cluster, we went with Option 1 — scaling to 2 nodes. It was the fastest fix and keeps costs low (2 × t3.medium ≈ $0.09/hr). We also added ignore_changes on desired_size so Cluster Autoscaler can freely scale between the min and max.

For the production cluster, we configured 3 nodes from the start with t3.large instances.

Key Takeaways

t3.medium on EKS has a hard limit of 17 pods due to VPC CNI IP address limits.
The formula: max_pods = (ENIs) × (IPs-per-ENI - 1) + 2.
The symptom is FailedScheduling: 0/1 nodes available: 1 Too many pods — check allocatable pods with kubectl get node -o jsonpath='{.status.allocatable.pods}'.
Fastest fix: add a second node. Most scalable fix: enable prefix delegation. Long-term fix: right-size the instance type.
Add ignore_changes = [scaling_config[0].desired_size] in Terraform if using Cluster Autoscaler, so Terraform doesn’t reset the count on every apply.