AWS Load Balancer Controller: Four Failures Before It Worked
Getting LBC running on EKS required solving four separate problems in sequence: a missing IRSA role, a template placeholder left in values, IMDS unreachable from pods, and subnet tags with a transposed cluster name.
The AWS Load Balancer Controller (LBC) is responsible for provisioning ALBs and NLBs in response to Kubernetes Ingress resources. Getting it running looked simple on paper — install via Helm, point at the cluster — but we hit four distinct failures before the first ALB was created.
Our Setup
We deploy LBC via an ArgoCD ApplicationSet defined in k8s/argocd/apps/infra-appset.yaml. The relevant section:
- name: aws-load-balancer-controller
namespace: kube-system
chart: aws-load-balancer-controller
repoURL: https://aws.github.io/eks-charts
targetRevision: "1.11.*"
wave: "1"
valuesInline: |
clusterName: mypie-eks-staging
serviceAccount:
create: true
name: aws-load-balancer-controller
annotations:
eks.amazonaws.com/role-arn: "REPLACE_WITH_LBC_IRSA_ARN"
Failure 1: No IRSA Role Existed
The first sign of trouble was the pod getting stuck in CrashLoopBackOff. Logs showed:
{"level":"error","ts":"...","msg":"failed to get VPC ID from instance metadata",
"error":"EC2 IMDS is not available, please check the IMDS service"}
Before that could even be addressed, the pod was failing an AWS auth check. We inspected the service account:
kubectl get sa aws-load-balancer-controller -n kube-system -o yaml
The service account existed with the REPLACE_WITH_LBC_IRSA_ARN annotation — which was the literal placeholder string from our template — and it had not been replaced with an actual IAM role ARN.
Root cause: The IRSA role for LBC did not exist yet. We had planned to create it via Terraform but it had been skipped.
Fix: Created the IAM role manually via AWS CLI using the official LBC IAM policy. The role trust relationship uses the OIDC provider of the EKS cluster:
# Get the OIDC provider URL
OIDC_URL=$(aws eks describe-cluster \
--name mypie-eks-staging \
--query 'cluster.identity.oidc.issuer' \
--output text | sed 's|https://||')
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
# Create trust policy
cat > /tmp/lbc-trust.json << TRUSTEOF
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::${ACCOUNT_ID}:oidc-provider/${OIDC_URL}"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"${OIDC_URL}:aud": "sts.amazonaws.com",
"${OIDC_URL}:sub": "system:serviceaccount:kube-system:aws-load-balancer-controller"
}
}
}]
}
TRUSTEOF
aws iam create-role \
--role-name mypie-eks-staging-aws-load-balancer-controller \
--assume-role-policy-document file:///tmp/lbc-trust.json
# Download and attach the official policy
curl -o /tmp/lbc-policy.json \
https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.11.0/docs/install/iam_policy.json
aws iam put-role-policy \
--role-name mypie-eks-staging-aws-load-balancer-controller \
--policy-name AWSLoadBalancerControllerPolicy \
--policy-document file:///tmp/lbc-policy.json
Failure 2: Placeholder Annotation Left in Values
After creating the role, we had the ARN. But we still had REPLACE_WITH_LBC_IRSA_ARN in our infra-appset.yaml. ArgoCD was already managing the deployment and would override any manual changes.
We did two things in parallel:
- Updated
infra-appset.yamlwith the real ARN (line 29):
annotations:
eks.amazonaws.com/role-arn: "arn:aws:iam::<account-id>:role/mypie-eks-staging-aws-load-balancer-controller"
- Immediately patched the live service account so we could test without waiting for ArgoCD sync:
kubectl annotate serviceaccount aws-load-balancer-controller \
-n kube-system \
eks.amazonaws.com/role-arn=arn:aws:iam::<account-id>:role/mypie-eks-staging-aws-load-balancer-controller \
--overwrite
Restarted the pod:
kubectl rollout restart deployment aws-load-balancer-controller -n kube-system
Failure 3: IMDS Unreachable from Pods
After fixing the role and annotation, the pod came back up but was still crashing:
{"level":"error","msg":"failed to get VPC ID",
"error":"couldn't retrieve VPC ID from instance metadata service,
EC2 IMDS is not available"}
LBC auto-detects the VPC ID by calling the EC2 Instance Metadata Service (IMDS) — specifically http://169.254.169.254/latest/meta-data/local-ipv4 to get the node IP, then making an EC2 API call to look up the VPC. This works when running on EC2 directly. Inside a Kubernetes pod with IMDSv2’s hop limit set to 1, the pod cannot reach IMDS (the extra network hop from the container to the node exceeds the hop limit).
Fix: Pass the VPC ID explicitly so LBC does not need to use IMDS to discover it. This is set via the vpcId Helm value.
Updated infra-appset.yaml (line 24):
valuesInline: |
clusterName: mypie-eks-staging
vpcId: vpc-<your-vpc-id>
serviceAccount:
create: true
name: aws-load-balancer-controller
annotations:
eks.amazonaws.com/role-arn: "arn:aws:iam::<account-id>:role/mypie-eks-staging-aws-load-balancer-controller"
And patched the live deployment directly for immediate effect:
kubectl patch deployment aws-load-balancer-controller -n kube-system \
--type=json \
-p='[{"op":"add","path":"/spec/template/spec/containers/0/args/-","value":"--aws-vpc-id=vpc-<your-vpc-id>"}]'
The pods came up healthy. Then we hit the final problem.
Failure 4: Subnet Tags with a Transposed Cluster Name
LBC was running but when we created an Ingress resource, no ALB was provisioned. The logs showed:
{
"level": "error",
"msg": "couldn't auto-discover subnets: unable to resolve at least one subnet",
"error": "2 match VPC and tags: [kubernetes.io/role/elb], 2 tagged for other cluster"
}
LBC uses two criteria to auto-discover public subnets for ALBs:
- Tag
kubernetes.io/role/elb=1(orinternal-elbfor private) - Tag
kubernetes.io/cluster/<cluster-name>=ownedorshared
We had correctly tagged the subnets for criterion 1. But criterion 2 was wrong. Our subnets were tagged:
kubernetes.io/cluster/mypie-staging-eks = owned
The actual cluster name is mypie-eks-staging — eks comes after mypie, not after staging. The two words were transposed.
Fix: Re-tag all 4 subnets (2 public for ALB, 2 private for internal):
SUBNETS=(
subnet-<public-az-a>
subnet-<public-az-b>
subnet-<private-az-a>
subnet-<private-az-b>
)
for subnet in "${SUBNETS[@]}"; do
# Remove the wrong tag
aws ec2 delete-tags \
--resources "$subnet" \
--tags Key="kubernetes.io/cluster/mypie-staging-eks"
# Add the correct tag
aws ec2 create-tags \
--resources "$subnet" \
--tags Key="kubernetes.io/cluster/mypie-eks-staging",Value="owned"
echo "Fixed $subnet"
done
After re-tagging, LBC reconciled the pending Ingress resources and provisioned the ALB within about 30 seconds:
{"level":"info","msg":"created loadBalancer",
"stackID":"mypie-tools",
"resourceID":"LoadBalancer",
"arn":"arn:aws:elasticloadbalancing:eu-central-1:<account-id>:loadbalancer/app/k8s-mypietools-.../..."}
Commit the Fix to Terraform
After all four manual fixes were validated, we committed the changes to infrastructure-as-code. The subnet tags are managed in the networking module, and the IRSA role was added to the EKS module’s IRSA configuration. The vpcId and correct role-arn are now in infra-appset.yaml:
valuesInline: |
clusterName: mypie-eks-staging
vpcId: vpc-<your-vpc-id>
serviceAccount:
create: true
name: aws-load-balancer-controller
annotations:
eks.amazonaws.com/role-arn: "arn:aws:iam::<account-id>:role/mypie-eks-staging-aws-load-balancer-controller"
replicaCount: 2
Summary of All Four Failures
| # | Error | Root Cause | Fix |
|---|---|---|---|
| 1 | IMDS not available / auth error | IRSA role did not exist | Create IAM role with correct OIDC trust |
| 2 | Pod failed to auth to AWS | Placeholder REPLACE_WITH_LBC_IRSA_ARN in annotation |
Update infra-appset.yaml line 29 + kubectl annotate |
| 3 | Failed to get VPC ID from IMDS | IMDSv2 hop limit blocks pod → IMDS access | Add vpcId to Helm values (infra-appset.yaml line 24) |
| 4 | Couldn’t auto-discover subnets | Subnet tag had transposed cluster name | Re-tag subnets to kubernetes.io/cluster/mypie-eks-staging |