Where AI Got It Wrong

April 26, 2026

Config that looked right, applied cleanly, and broke at runtime. Human caught each one.

AI handled the heavy lifting — Terraform, IAM, Helm, the Go app, CI/CD. But passing terraform plan doesn't mean it works. Every issue below was config that looked right, applied cleanly, and broke at runtime.

Karpenter CRDs — NodePool and NodeClass don't belong in the same root as the cluster

Claude put the NodePool and NodeClass in the same root module as the EKS cluster. Anyone familiar with how Kubernetes CRDs work would catch this before even running plan — NodePool and NodeClass are custom resources that don't exist until the Karpenter Helm chart has been deployed to the cluster. Terraform tries to validate all resources at plan time, so it fails looking for CRDs that aren't there yet. Claude had to be told to move Karpenter into its own module, applied separately after the cluster is up.

Wrong CPU architecture — first deploy failed

GitHub's standard runners are linux/amd64. Without an explicit --platform flag, docker build inherits the runner's architecture — so the image ends up amd64 by default, not by declaration. Easy to miss. No architecture constraint in the NodePool meant Karpenter picked Graviton — cheaper, but ARM64. First deploy hit exec format error. Only visible in pod logs after the node was already provisioned.

Ingress — cert ARN through the pipeline vs certificate autodiscovery

Claude set up the ingress to route traffic to the pod correctly — ALB scheme, target type, HTTPS redirect all in place. Where it went wrong was the certificate setup. It passed the ACM cert ARN through the pipeline as a Helm value, which meant separate ARNs for primary and DR and a manual value living in GitHub secrets. The right approach from the start was certificate-discovery: "true" with a host field — the LBC resolves the cert automatically from ACM, no ARN, no secrets, works identically across both regions.

Two health checks, not one

Route 53 and the ALB target group are two independent health check systems. Route 53 decides when to switch DNS. The ALB decides whether to send traffic to a pod — it was checking / by default, app returns 404 there, ALB pulled the pod from rotation and served 503. Pod was healthy the whole time, writing to DynamoDB fine. Route 53 pointed to the right region, ALB thought the pod was down. One annotation in the ingress fixes it: alb.ingress.kubernetes.io/healthcheck-path: /health/live.

DR environment copy-paste — AI didn't push back on the approach

Asked Claude to mirror dev to us-west-2 for DR. It created environments/DR/ as a near-identical copy of environments/dev/ — same structure, different variable values. Applied fine. What it didn't do was flag that any module change now needs to be made in two places, or mention that Terragrunt exists specifically for this problem. Claude executes what you ask. It doesn't question whether there's a better way. That's still on you. If your infrastructure runs only in AWS then it is worth exploring CloudFormation StackSets — it handles multi-region deployments natively. Terragrunt is the pragmatic choice if you're already in Terraform — define the structure once, stamp it out per environment.