Skip to content

GCP

GCP/GKE/CAPG is the fully exercised checked-in cloud integration path for the current release line. It includes Terraform, deployment automation, CAPG bootstrap, managed node-pool smoke coverage, and KEDA/resume smoke coverage. AWS/EKS and Azure/AKS also have Terraform and registry/deploy material, but they remain experimental provider-validation paths until live CAPA/CAPZ parity is proven.

The GCP path combines:

  • Terraform for base infrastructure;
  • GKE for the Kubernetes runtime;
  • Artifact Registry when a private image mirror is required;
  • GHCR images for the public XTrinode component image tags;
  • CAPI/CAPG for managed GKE cluster and node-pool integration;
  • Helm for operator, API server, gateway, and CRD installation.

The checked-in GCP path currently uses the already-authenticated gcloud account for sandbox provider validation. Use a project-admin or Owner-level account only in a disposable project, and replace it with a dedicated least-privilege service account when the provider path is hardened.

Do not commit credentials, generated kubeconfigs, Terraform state, or terraform.tfvars.

Configure local auth:

Terminal window
export GCP_PROJECT_ID="<YOUR_PROJECT_ID>"
gcloud auth login
gcloud auth application-default login
gcloud config set project "$GCP_PROJECT_ID"

Enable the APIs used by Terraform:

Terminal window
gcloud services enable \
container.googleapis.com \
sqladmin.googleapis.com \
artifactregistry.googleapis.com \
servicenetworking.googleapis.com \
compute.googleapis.com

Terraform prepares the base environment:

VPC
-> subnet
-> firewall rules
-> NAT
-> optional database services
-> Artifact Registry
-> IAM

For a first deploy from scratch, use the main repository’s ordered GCP targets:

Terminal window
make gcp-management-up
make gcp-images-push
make gcp-control-plane-deploy

make gcp-management-up wraps the safer two-phase Terraform flow: create the GKE management cluster first, configure kubectl, then apply the Kubernetes and cloud add-ons. Use the lower-level Terraform targets only for Terraform-only debugging.

For private clusters, use Cloud Shell, a bastion, VPN, or an authorized network entry to obtain Kubernetes credentials.

Terminal window
gcloud container clusters get-credentials xtrinode-gke-test \
--zone us-central1-a \
--project "$GCP_PROJECT_ID"

Current component image tags are listed in Versioning. If your GCP organization requires private regional images, mirror those tags into Artifact Registry and configure the Helm image registry values accordingly.

The main repository provides a Makefile target for the Artifact Registry image path:

Terminal window
make gcp-images-push

If the GKE infrastructure already exists and images are already pushed, deploy-gcp is the shorter control-plane redeploy path:

Terminal window
make deploy-gcp

CAPI/CAPG can create the GKE workload cluster and provider-managed node pools after the GCP control plane is deployed. The high-level sequence is:

  1. Bring up the GCP management cluster and XTrinode control plane.
  2. Bootstrap CAPI/CAPG management components.
  3. Create the CAPG-managed GKE workload cluster.
  4. Run the managed node-pool smoke path.
  5. Inspect the CAPG workload cluster nodes.

CAPG GKE support uses the EXP_CAPG_GKE=true feature gate.

After the GCP control plane is deployed, the CAPG validation flow uses:

Terminal window
make gcp-capg-management-up
make gcp-capg-workload-up
make gcp-capg-nodepool-smoke
make gcp-capg-workload-nodes
apiVersion: analytics.xtrinode.io/v1
kind: XTrinode
metadata:
name: analytics
namespace: team-a
spec:
size: s
minWorkers: 0
maxWorkers: 8
nodePool:
name: analytics-nodes
provider: gcp
providerMode: managed
clusterName: xtrinode-gke-test
kubernetesVersion: v1.35.3
minNodes: 0
maxNodes: 10
nodeLabels:
xtrinode.io/runtime: analytics
xtrinode.io/node-pool: analytics-nodes
gcp:
machineType: e2-standard-4

The operator creates the provider node-pool resources. CAPG and the cloud provider create the actual nodes inside the existing GCP network. Cluster Autoscaler then scales the node pool based on pod scheduling demand.

Terminal window
kubectl get cluster -A
kubectl get machinepool -A
kubectl describe machinepool analytics-nodes -n team-a
kubectl get nodes -L xtrinode.io/node-pool,xtrinode.io/runtime
kubectl get events -n team-a --sort-by='.lastTimestamp'

For failed node pools, verify the GCP project, region, cluster name, IAM roles, quota, CAPG controller health, and machine type availability.

After live provider paths are hardened, replace direct project-admin usage with a dedicated service account and document the exact required roles.