Skip to content

Troubleshooting

Start with status, events, then logs. Most XTrinode failures show up in one of those three places before you need provider-level debugging.

Terminal window
kubectl get crd | grep xtrinode
kubectl get pods -n xtrinode-system
kubectl get pods -n xtrinode-gateway
kubectl get xtrinode -A
kubectl get events -A --sort-by='.lastTimestamp'

Symptoms:

  • status.phase stays Reconciling;
  • coordinator or worker resources are missing;
  • route state is not registered.

Checks:

Terminal window
kubectl describe xtrinode analytics -n team-a
kubectl logs -n xtrinode-system -l app.kubernetes.io/name=xtrinode-operator
kubectl get events -n team-a --sort-by='.lastTimestamp'

Likely causes:

  • CRDs or Helm resources are incomplete;
  • selected catalog resources are invalid;
  • referenced secrets are missing;
  • KEDA is enabled but not installed;
  • node-pool provider resources are unhealthy.

Checks:

Terminal window
kubectl get xtrinode analytics -n team-a
kubectl get service trino-analytics -n team-a
kubectl get configmap trino-gateway-routes -n xtrinode-gateway -o yaml
kubectl logs -n xtrinode-gateway -l app.kubernetes.io/name=xtrinode-gateway

Verify:

  • the runtime is not intentionally suspended;
  • the coordinator pod is ready;
  • the route ConfigMap contains the expected route;
  • the X-Trino-XTrinode header or hostname matches the runtime route;
  • gateway auth is not rejecting the request.

Checks:

Terminal window
kubectl get scaledobject -n team-a
kubectl describe scaledobject xtrinode-analytics-workers -n team-a
kubectl logs -n xtrinode-system -l app.kubernetes.io/name=keda-operator

Verify:

  • KEDA is installed;
  • the Prometheus or HTTP trigger is reachable;
  • the query returns a number;
  • minWorkers, maxWorkers, and KEDA threshold are sane;
  • Kubernetes has capacity or the node pool can scale.

Checks:

Terminal window
kubectl logs -n team-a -l app.kubernetes.io/name=trino,app.kubernetes.io/component=coordinator --previous
kubectl describe pods -n team-a -l app.kubernetes.io/name=trino,app.kubernetes.io/component=coordinator

Common causes:

  • invalid catalog configuration;
  • missing secret environment variable;
  • bad Trino config in valuesOverlay;
  • memory limits too low for startup or query planning.

Checks:

Terminal window
kubectl get xtrinodecatalog -n team-a
kubectl get xtrinode analytics -n team-a -o yaml
kubectl get configmap -n team-a | grep catalog
kubectl get secret -n team-a

Verify:

  • spec.catalogSelector.matchLabels matches XTrinodeCatalog.spec.labels;
  • secrets referenced by the catalog live in the same namespace;
  • connector property names match the supported CRD fields;
  • the runtime rolled after the catalog change.

Checks:

Terminal window
kubectl get machinepool -A
kubectl describe machinepool analytics-nodes -n team-a
kubectl get nodes -L xtrinode.io/node-pool,xtrinode.io/runtime
kubectl get events -n team-a --sort-by='.lastTimestamp'

For GCP, confirm the CAPI/CAPG controller is installed, the clusterName matches the workload cluster, IAM roles are present, and project quota allows the requested machine type.

Collect this when escalating:

Terminal window
kubectl get xtrinode analytics -n team-a -o yaml
kubectl describe xtrinode analytics -n team-a
kubectl get xtrinodecatalog -n team-a -o yaml
kubectl get pods -n team-a -o wide
kubectl get scaledobject -n team-a -o yaml
kubectl logs -n xtrinode-system -l app.kubernetes.io/name=xtrinode-operator --tail=300
kubectl logs -n xtrinode-gateway -l app.kubernetes.io/name=xtrinode-gateway --tail=300