Troubleshooting

Start with status, events, then logs. Most XTrinode failures show up in one of those three places before you need provider-level debugging.

Overall Health

kubectl get crd | grep xtrinode
kubectl get pods -n xtrinode-system
kubectl get pods -n xtrinode-gateway
kubectl get xtrinode -A
kubectl get events -A --sort-by='.lastTimestamp'

Runtime Stuck In Reconciling

Symptoms:

status.phase stays Reconciling;
coordinator or worker resources are missing;
route state is not registered.

Checks:

kubectl describe xtrinode analytics -n team-a
kubectl logs -n xtrinode-system -l app.kubernetes.io/name=xtrinode-operator
kubectl get events -n team-a --sort-by='.lastTimestamp'

Likely causes:

CRDs or Helm resources are incomplete;
selected catalog resources are invalid;
referenced secrets are missing;
KEDA is enabled but not installed;
node-pool provider resources are unhealthy.

Query Returns 503 Or No Backend

Checks:

kubectl get xtrinode analytics -n team-a
kubectl get service trino-analytics -n team-a
kubectl get configmap trino-gateway-routes -n xtrinode-gateway -o yaml
kubectl logs -n xtrinode-gateway -l app.kubernetes.io/name=xtrinode-gateway

Verify:

the runtime is not intentionally suspended;
the coordinator pod is ready;
the route ConfigMap contains the expected route;
the X-Trino-XTrinode header or hostname matches the runtime route;
gateway auth is not rejecting the request.

Workers Do Not Scale

Checks:

kubectl get scaledobject -n team-a
kubectl describe scaledobject xtrinode-analytics-workers -n team-a
kubectl logs -n xtrinode-system -l app.kubernetes.io/name=keda-operator

Verify:

KEDA is installed;
the Prometheus or HTTP trigger is reachable;
the query returns a number;
minWorkers, maxWorkers, and KEDA threshold are sane;
Kubernetes has capacity or the node pool can scale.

Coordinator CrashLoopBackOff

Checks:

kubectl logs -n team-a -l app.kubernetes.io/name=trino,app.kubernetes.io/component=coordinator --previous
kubectl describe pods -n team-a -l app.kubernetes.io/name=trino,app.kubernetes.io/component=coordinator

Common causes:

invalid catalog configuration;
missing secret environment variable;
bad Trino config in valuesOverlay;
memory limits too low for startup or query planning.

Catalog Does Not Appear

Checks:

kubectl get xtrinodecatalog -n team-a
kubectl get xtrinode analytics -n team-a -o yaml
kubectl get configmap -n team-a | grep catalog
kubectl get secret -n team-a

Verify:

spec.catalogSelector.matchLabels matches XTrinodeCatalog.spec.labels;
secrets referenced by the catalog live in the same namespace;
connector property names match the supported CRD fields;
the runtime rolled after the catalog change.

Node Pool Does Not Provision

Checks:

kubectl get machinepool -A
kubectl describe machinepool analytics-nodes -n team-a
kubectl get nodes -L xtrinode.io/node-pool,xtrinode.io/runtime
kubectl get events -n team-a --sort-by='.lastTimestamp'

For GCP, confirm the CAPI/CAPG controller is installed, the clusterName matches the workload cluster, IAM roles are present, and project quota allows the requested machine type.

Useful Debug Bundle

Collect this when escalating:

kubectl get xtrinode analytics -n team-a -o yaml
kubectl describe xtrinode analytics -n team-a
kubectl get xtrinodecatalog -n team-a -o yaml
kubectl get pods -n team-a -o wide
kubectl get scaledobject -n team-a -o yaml
kubectl logs -n xtrinode-system -l app.kubernetes.io/name=xtrinode-operator --tail=300
kubectl logs -n xtrinode-gateway -l app.kubernetes.io/name=xtrinode-gateway --tail=300