Troubleshooting
Start with status, events, then logs. Most XTrinode failures show up in one of those three places before you need provider-level debugging.
Overall Health
Section titled “Overall Health”kubectl get crd | grep xtrinodekubectl get pods -n xtrinode-systemkubectl get pods -n xtrinode-gatewaykubectl get xtrinode -Akubectl get events -A --sort-by='.lastTimestamp'Runtime Stuck In Reconciling
Section titled “Runtime Stuck In Reconciling”Symptoms:
status.phasestaysReconciling;- coordinator or worker resources are missing;
- route state is not registered.
Checks:
kubectl describe xtrinode analytics -n team-akubectl logs -n xtrinode-system -l app.kubernetes.io/name=xtrinode-operatorkubectl get events -n team-a --sort-by='.lastTimestamp'Likely causes:
- CRDs or Helm resources are incomplete;
- selected catalog resources are invalid;
- referenced secrets are missing;
- KEDA is enabled but not installed;
- node-pool provider resources are unhealthy.
Query Returns 503 Or No Backend
Section titled “Query Returns 503 Or No Backend”Checks:
kubectl get xtrinode analytics -n team-akubectl get service trino-analytics -n team-akubectl get configmap trino-gateway-routes -n xtrinode-gateway -o yamlkubectl logs -n xtrinode-gateway -l app.kubernetes.io/name=xtrinode-gatewayVerify:
- the runtime is not intentionally suspended;
- the coordinator pod is ready;
- the route ConfigMap contains the expected route;
- the
X-Trino-XTrinodeheader or hostname matches the runtime route; - gateway auth is not rejecting the request.
Workers Do Not Scale
Section titled “Workers Do Not Scale”Checks:
kubectl get scaledobject -n team-akubectl describe scaledobject xtrinode-analytics-workers -n team-akubectl logs -n xtrinode-system -l app.kubernetes.io/name=keda-operatorVerify:
- KEDA is installed;
- the Prometheus or HTTP trigger is reachable;
- the query returns a number;
minWorkers,maxWorkers, and KEDA threshold are sane;- Kubernetes has capacity or the node pool can scale.
Coordinator CrashLoopBackOff
Section titled “Coordinator CrashLoopBackOff”Checks:
kubectl logs -n team-a -l app.kubernetes.io/name=trino,app.kubernetes.io/component=coordinator --previouskubectl describe pods -n team-a -l app.kubernetes.io/name=trino,app.kubernetes.io/component=coordinatorCommon causes:
- invalid catalog configuration;
- missing secret environment variable;
- bad Trino config in
valuesOverlay; - memory limits too low for startup or query planning.
Catalog Does Not Appear
Section titled “Catalog Does Not Appear”Checks:
kubectl get xtrinodecatalog -n team-akubectl get xtrinode analytics -n team-a -o yamlkubectl get configmap -n team-a | grep catalogkubectl get secret -n team-aVerify:
spec.catalogSelector.matchLabelsmatchesXTrinodeCatalog.spec.labels;- secrets referenced by the catalog live in the same namespace;
- connector property names match the supported CRD fields;
- the runtime rolled after the catalog change.
Node Pool Does Not Provision
Section titled “Node Pool Does Not Provision”Checks:
kubectl get machinepool -Akubectl describe machinepool analytics-nodes -n team-akubectl get nodes -L xtrinode.io/node-pool,xtrinode.io/runtimekubectl get events -n team-a --sort-by='.lastTimestamp'For GCP, confirm the CAPI/CAPG controller is installed, the clusterName
matches the workload cluster, IAM roles are present, and project quota allows
the requested machine type.
Useful Debug Bundle
Section titled “Useful Debug Bundle”Collect this when escalating:
kubectl get xtrinode analytics -n team-a -o yamlkubectl describe xtrinode analytics -n team-akubectl get xtrinodecatalog -n team-a -o yamlkubectl get pods -n team-a -o widekubectl get scaledobject -n team-a -o yamlkubectl logs -n xtrinode-system -l app.kubernetes.io/name=xtrinode-operator --tail=300kubectl logs -n xtrinode-gateway -l app.kubernetes.io/name=xtrinode-gateway --tail=300