Skip to content

AWS

AWS/EKS/CAPA support is present as checked-in infrastructure and deployment material, but it is an experimental provider-validation path. Treat this page as a status and orientation note, not a hardened runbook or parity claim with the GCP path.

AreaStatus
TerraformChecked-in module material for VPC, EKS, ECR, optional RDS, namespaces, and autoscaler wiring.
Deploy automationChecked-in scripts exist for provider validation; live parity with GCP is still tracked.
RegistryECR-backed image publishing is wired in the repository path.
Runtime node poolsAWS provider resource generation is covered locally; live CAPA-managed node-pool parity remains tracked.
Auth postureCurrent validation uses an already-authenticated admin/root-equivalent AWS CLI profile in a sandbox account.

Use AWS only for sandbox/provider validation until the live smoke path is promoted. Do not commit access keys, generated kubeconfigs, Terraform state, or terraform.tfvars.

The repository includes:

  • Terraform for the base EKS environment;
  • ECR image publishing flow for XTrinode components;
  • deployment script wiring for CRDs, API server auth Secrets, Helm charts, and optional observability;
  • node-pool resource generation code and tests;
  • Spot quota notes for EKS node groups.

The main repository carries AWS deploy targets and scripts, but this page intentionally stays at status level until the live provider smoke path is promoted.

  • The path is not yet documented as a production runbook.
  • Live CAPA-managed node-pool parity is still tracked.
  • IAM is not yet least-privilege documented.
  • Local kubectl and Helm access may require temporary EKS endpoint changes or a private access path, depending on the Terraform settings.
  • Validate suspend, resume, gateway routing, KEDA scaling, image pulls, and node-pool behavior in the target account before relying on the path.

If the EKS path uses Spot-backed node groups, check the EC2 Spot vCPU quota for standard instance families before debugging autoscaling. AWS can surface Spot vCPU quota issues as generic Fleet request failures.

Before this page becomes a full runbook, AWS needs live validation for:

  • Terraform from-zero deployment;
  • registry image push and pull from EKS nodes;
  • Helm install and upgrade;
  • API server auth and gateway resume-token flow;
  • gateway routing, suspend, resume, KEDA scaling, and teardown;
  • managed node-pool behavior through the chosen AWS provider integration.

After that validation, the admin/root-equivalent profile should be replaced with a least-privilege IAM role or user and exact permissions should be documented.