GCP

GCP/GKE/CAPG is the fully exercised checked-in cloud integration path for the current release line. It includes Terraform, deployment automation, CAPG bootstrap, managed node-pool smoke coverage, and KEDA/resume smoke coverage. AWS/EKS and Azure/AKS also have Terraform and registry/deploy material, but they remain experimental provider-validation paths until live CAPA/CAPZ parity is proven.

What The GCP Path Covers

The GCP path combines:

Terraform for base infrastructure;
GKE for the Kubernetes runtime;
Artifact Registry when a private image mirror is required;
GHCR images for the public XTrinode component image tags;
CAPI/CAPG for managed GKE cluster and node-pool integration;
Helm for operator, API server, gateway, and CRD installation.

IAM Bootstrap

The checked-in GCP path currently uses the already-authenticated gcloud account for sandbox provider validation. Use a project-admin or Owner-level account only in a disposable project, and replace it with a dedicated least-privilege service account when the provider path is hardened.

Do not commit credentials, generated kubeconfigs, Terraform state, or terraform.tfvars.

Configure local auth:

export GCP_PROJECT_ID="<YOUR_PROJECT_ID>"

gcloud auth login
gcloud auth application-default login
gcloud config set project "$GCP_PROJECT_ID"

Enable the APIs used by Terraform:

gcloud services enable \
  container.googleapis.com \
  sqladmin.googleapis.com \
  artifactregistry.googleapis.com \
  servicenetworking.googleapis.com \
  compute.googleapis.com

From-Scratch GCP Deploy

Terraform prepares the base environment:

VPC
  -> subnet
  -> firewall rules
  -> NAT
  -> optional database services
  -> Artifact Registry
  -> IAM

For a first deploy from scratch, use the main repository’s ordered GCP targets:

make gcp-management-up
make gcp-images-push
make gcp-control-plane-deploy

make gcp-management-up wraps the safer two-phase Terraform flow: create the GKE management cluster first, configure kubectl, then apply the Kubernetes and cloud add-ons. Use the lower-level Terraform targets only for Terraform-only debugging.

For private clusters, use Cloud Shell, a bastion, VPN, or an authorized network entry to obtain Kubernetes credentials.

gcloud container clusters get-credentials xtrinode-gke-test \
  --zone us-central1-a \
  --project "$GCP_PROJECT_ID"

Images

Current component image tags are listed in Versioning. If your GCP organization requires private regional images, mirror those tags into Artifact Registry and configure the Helm image registry values accordingly.

The main repository provides a Makefile target for the Artifact Registry image path:

make gcp-images-push

If the GKE infrastructure already exists and images are already pushed, deploy-gcp is the shorter control-plane redeploy path:

make deploy-gcp

CAPI/CAPG Flow

CAPI/CAPG can create the GKE workload cluster and provider-managed node pools after the GCP control plane is deployed. The high-level sequence is:

Bring up the GCP management cluster and XTrinode control plane.
Bootstrap CAPI/CAPG management components.
Create the CAPG-managed GKE workload cluster.
Run the managed node-pool smoke path.
Inspect the CAPG workload cluster nodes.

CAPG GKE support uses the EXP_CAPG_GKE=true feature gate.

After the GCP control plane is deployed, the CAPG validation flow uses:

make gcp-capg-management-up
make gcp-capg-workload-up
make gcp-capg-nodepool-smoke
make gcp-capg-workload-nodes

Runtime Node Pool Example

apiVersion: analytics.xtrinode.io/v1
kind: XTrinode
metadata:
  name: analytics
  namespace: team-a
spec:
  size: s
  minWorkers: 0
  maxWorkers: 8
  nodePool:
    name: analytics-nodes
    provider: gcp
    providerMode: managed
    clusterName: xtrinode-gke-test
    kubernetesVersion: v1.35.3
    minNodes: 0
    maxNodes: 10
    nodeLabels:
      xtrinode.io/runtime: analytics
      xtrinode.io/node-pool: analytics-nodes
    gcp:
      machineType: e2-standard-4

The operator creates the provider node-pool resources. CAPG and the cloud provider create the actual nodes inside the existing GCP network. Cluster Autoscaler then scales the node pool based on pod scheduling demand.

GCP Checks

kubectl get cluster -A
kubectl get machinepool -A
kubectl describe machinepool analytics-nodes -n team-a
kubectl get nodes -L xtrinode.io/node-pool,xtrinode.io/runtime
kubectl get events -n team-a --sort-by='.lastTimestamp'

For failed node pools, verify the GCP project, region, cluster name, IAM roles, quota, CAPG controller health, and machine type availability.

Hardening Later

After live provider paths are hardened, replace direct project-admin usage with a dedicated service account and document the exact required roles.