EKS Pod Limits: When Your Node Just Can't Fit One More Pod

We bootstrapped ArgoCD on EKS and one of its pods got stuck in Pending with "Too many pods." The t3.medium limit of 17 pods caught us off guard. Here's why the limit exists, how to calculate it, and what your options are.

The Symptom

After scaling up our mypie-infra ArgoCD ApplicationSet, several infra add-ons were deployed: cert-manager, LBC, metrics-server, Atlantis. At the same time, ArgoCD was running its own 7-pod control plane. Everything scheduled fine — until a restart of the argocd-repo-server pod triggered a rolling update.

The new pod couldn’t be scheduled:

Events:
  Warning  FailedScheduling  14m  default-scheduler
    0/1 nodes are available: 1 Too many pods.
    preemption: 0/1 nodes are available: 1 No preemption victims found.
$ kubectl get nodes
NAME                                          STATUS   ROLES    AGE    VERSION
ip-10-1-11-55.eu-central-1.compute.internal   Ready    <none>   95m    v1.32.12-eks-f69f56f

One node, 17 pods, node at capacity, one pod can’t schedule.

Why Does t3.medium Cap Out at 17 Pods?

AWS EKS uses the VPC CNI plugin (aws-node), which assigns real VPC IP addresses to each pod. The number of IPs available on a node is determined by the instance type’s ENI (Elastic Network Interface) limits:

max pods = (number of ENIs) × (IPs per ENI - 1) + 2

For t3.medium:

Attribute Value
Max ENIs 3
Max IPs per ENI 6
max pods = 3 × (6 - 1) + 2 = 3 × 5 + 2 = 17

This is enforced by the kubelet — Kubernetes itself won’t schedule more pods than the node declares it can hold.

You can verify the limit:

kubectl get node <node-name> -o jsonpath='{.status.allocatable.pods}'
# 17

What Pods Were Taking Up All 17 Slots?

$ kubectl get pods -A --field-selector spec.nodeName=ip-10-1-11-55... \
    -o custom-columns='NAMESPACE:.metadata.namespace,NAME:.metadata.name' \
  | sort

Decomposed:

Component Pod Count
kube-system: aws-node, kube-proxy, coredns ×2, ebs-csi ×3, metrics-server, LBC ×2 ~11
argocd: application-controller, applicationset-controller, dex, redis, server, notifications 6
Total 17

The argocd-repo-server pod — the one that needed to restart — would have been pod #18.

Option 1: Add More Nodes (Quickest)

Scale the node group via AWS CLI:

aws eks update-nodegroup-config \
  --cluster-name mypie-eks-staging \
  --nodegroup-name mypie-eks-staging-general \
  --scaling-config minSize=1,maxSize=3,desiredSize=2 \
  --region eu-central-1

The second node joined in ~3 minutes. The pending pod scheduled immediately.

Update Terraform to match (so the next apply doesn’t reset desired to 1):

resource "aws_eks_node_group" "general" {
  scaling_config {
    desired_size = var.environment == "production" ? 3 : 2   # was 1
    min_size     = var.environment == "production" ? 3 : 1
    max_size     = var.environment == "production" ? 10 : 3
  }

  # Prevent Terraform from overriding autoscaler-managed desired count
  lifecycle {
    ignore_changes = [scaling_config[0].desired_size]
  }
}

Option 2: Enable VPC CNI Prefix Delegation (More Pods Per Node)

If you want to keep a single t3.medium but fit more pods, enable prefix delegation on the VPC CNI. This assigns /28 CIDR prefixes to ENI slots instead of individual IPs, multiplying capacity by 16:

new max = (ENIs) × (IPs per ENI - 1) × 16 + 2
t3.medium: 3 × 5 × 16 + 2 = 242 pods (EKS caps at 110)

Enable it on the EKS add-on:

aws eks update-addon \
  --cluster-name mypie-eks-staging \
  --addon-name vpc-cni \
  --configuration-values '{"env":{"ENABLE_PREFIX_DELEGATION":"true","WARM_PREFIX_TARGET":"1"}}' \
  --region eu-central-1

Important: After enabling prefix delegation, existing nodes need to be recycled for the new limits to take effect. The CNI calculates available IPs on startup. Rolling-restart the aws-node daemonset and cordon/drain/replace nodes, or simply scale-in and scale-out the node group.

You also need to update the kubelet’s --max-pods setting — for managed node groups, set the EKS max-pods value via a launch template with a custom bootstrap script, or use the EKS eks-max-pods parameter:

# Get the recommended max-pods for t3.medium with prefix delegation
aws ec2 describe-instance-types \
  --instance-types t3.medium \
  --query 'InstanceTypes[0].NetworkInfo' \
  --output table

Or use the EKS max pods calculator.

Option 3: Use a Larger Instance Type

The simplest long-term option: just use a bigger instance. t3.large gives you 35 pods, t3.xlarge gives 58, m5.large gives 29.

For staging environments, t3.large is usually sufficient and cost-effective enough.

resource "aws_eks_node_group" "general" {
  instance_types = ["t3.large"]   # was "t3.medium"
}

Note: changing instance_types in an EKS managed node group replaces the node group (forces a new node group creation and old one deletion). Plan for node drain.

Which Option We Chose

For our staging cluster, we went with Option 1 — scaling to 2 nodes. It was the fastest fix and keeps costs low (2 × t3.medium ≈ $0.09/hr). We also added ignore_changes on desired_size so Cluster Autoscaler can freely scale between the min and max.

For the production cluster, we configured 3 nodes from the start with t3.large instances.

Key Takeaways

  • t3.medium on EKS has a hard limit of 17 pods due to VPC CNI IP address limits.
  • The formula: max_pods = (ENIs) × (IPs-per-ENI - 1) + 2.
  • The symptom is FailedScheduling: 0/1 nodes available: 1 Too many pods — check allocatable pods with kubectl get node -o jsonpath='{.status.allocatable.pods}'.
  • Fastest fix: add a second node. Most scalable fix: enable prefix delegation. Long-term fix: right-size the instance type.
  • Add ignore_changes = [scaling_config[0].desired_size] in Terraform if using Cluster Autoscaler, so Terraform doesn’t reset the count on every apply.