Skip to content

Architecture

XTrinode has a control plane that reconciles runtime intent and a query plane that routes SQL traffic to the right Trino coordinator.

flowchart TB clients["Trino CLI, BI tools, notebooks, applications"] ingress["Ingress or internal Service"] gateway["XTrinode Gateway<br/>routing, auth, rate limits, sticky queries"] api["API Server<br/>resume, suspend, status, metrics"] operator["Operator<br/>XTrinode and XTrinodeCatalog reconciliation"] kube["Kubernetes API"] routes["Gateway routes ConfigMap"] catalogs["XTrinodeCatalog resources<br/>catalog ConfigMaps and Secrets"] keda["KEDA<br/>optional worker autoscaling"] subgraph runtimes["Runtime namespaces"] runtimeA["XTrinode A<br/>Trino coordinator and workers"] runtimeB["XTrinode B<br/>Trino coordinator and workers"] shared["Shared routing group<br/>multiple Trino backends"] end clients --> ingress --> gateway gateway --> runtimeA gateway --> runtimeB gateway --> shared gateway --> api api --> kube operator --> kube kube --> routes --> gateway catalogs --> operator kube --> runtimeA kube --> runtimeB kube --> shared keda --> runtimeA keda --> runtimeB
ComponentResponsibility
OperatorReconciles XTrinode and XTrinodeCatalog resources into Kubernetes resources.
API serverCoordinates runtime lifecycle operations such as resume, suspend, status, and control-plane actions.
GatewayRoutes Trino client traffic by hostname, X-Trino-XTrinode header, shared pool, or default route.
KEDAOptionally scales worker pools from configured metrics.
Cluster API providersOptionally create per-runtime cloud node pools.
XTrinodeCatalogKeeps catalog declarations separate from runtime lifecycle.

The gateway is the Trino-facing entrypoint. It can sit behind an ingress for external traffic or be reached as an internal Kubernetes service. It is not an Ingress controller; cloud load balancers, DNS, and TLS edge policy stay outside the gateway.

It provides:

  • hostname-based routing;
  • X-Trino-XTrinode header routing;
  • default-route fallback when no explicit selector is provided;
  • shared-pool load balancing;
  • sticky query routing;
  • optional authentication and rate limiting;
  • active health checks and circuit breaking;
  • auto-resume when a selected backend is paused or unavailable.

The operator writes route entries to the trino-gateway-routes ConfigMap in the gateway namespace. Health and metrics endpoints bypass gateway auth and rate limiting so platform probes keep working.

The operator is the reconciliation owner. It resolves catalogs, applies runtime guardrails, renders Trino resources, configures scaling, registers gateway routes, evaluates lifecycle state, and updates status.

The API server handles lifecycle requests that require coordination. Resume requests use Kubernetes Lease objects so many simultaneous first queries do not stampede the control plane.

When a gateway resume call wins the lease, the API server records resume intent on the runtime and returns retry guidance. If another caller already holds the lease, later callers get retry guidance instead of triggering another resume.

Runtimes move through a declarative lifecycle:

Pending -> Reconciling -> Ready -> Suspending -> Suspended -> Resuming -> Reconciling -> Ready

The expanded operating model also includes route draining, runtime readiness, resume leases, and finalizer-backed cleanup.

StateMeaning
PendingA runtime resource exists and needs initial reconciliation.
ReconcilingKubernetes resources, catalogs, routes, and scaling objects are being applied.
ReadyThe coordinator is reachable and the gateway can send new queries.
SuspendingThe controller is enforcing suspended invariants.
SuspendedIntent or idle policy has paused compute.
ResumingDemand or an explicit command is bringing compute back.
ErrorReconciliation or lifecycle control failed and needs operator attention.

Deletion is handled by finalizers and route cleanup, but it is not a status.phase value.

Gateway route state is more specific than the high-level runtime phase:

Route stateMeaning
RUNNINGNew queries can be routed to the backend.
RESUMINGResume is in progress; clients should retry later.
PAUSEDCompute is intentionally unavailable.
DRAININGExisting sticky queries can continue, but new queries should avoid the backend.
REMOVEDThe backend has been deregistered.

For create and update operations, the operator:

  1. Reads the XTrinode spec.
  2. Resolves selected XTrinodeCatalog resources.
  3. Extracts secret references for catalog credentials.
  4. Applies namespace guardrails before runtime resources.
  5. Waits for requested node-pool readiness before scheduling Trino pods.
  6. Applies services, config maps, service accounts, coordinator, workers, and optional monitoring.
  7. Applies wake TTL and fixed-worker or KEDA scaling resources.
  8. Publishes gateway route state as RESUMING until runtime readiness passes, then switches the route to RUNNING and updates status.

Catalogs are separated from runtimes so data-source definitions can be reused across teams and compute units.

Secret
-> XTrinodeCatalog
-> catalog ConfigMap with secret-backed placeholders
-> selected by XTrinode runtime
-> mounted into Trino coordinator and workers

XTrinode supports three worker modes:

  • fixed worker counts for predictable small deployments;
  • KEDA-managed worker pools for dynamic runtime capacity.
  • native HPA-managed workers through the privileged valuesOverlay.server.autoscaling escape hatch.

KEDA can react to metrics, including Prometheus-backed query pressure when that integration is configured. Gateway-observed query pressure is useful for scale-from-zero because worker metrics do not exist while workers are already at zero.

Failure or conditionExpected containment
One runtime overloadsGateway routing, worker limits, backend state, and namespace resources isolate other runtimes.
First query hits a suspended runtimeGateway asks the API server to resume and returns retry guidance.
Many clients trigger resume togetherAPI server lease gating lets one resume operation win.
Runtime is not ready yetOperator keeps the backend out of normal routing until readiness passes.
Cloud node pool cannot provisionXTrinode status and Kubernetes events expose scheduling and provider failures.
Gateway route ConfigMap has invalid YAML or no valid entriesGateway keeps the last-good in-memory routes instead of replacing them with invalid state.
Delete is interruptedFinalizers and route deregistration let reconciliation resume cleanup.

A common deployment separates:

  • xtrinode-system for the operator and API server;
  • xtrinode-gateway for the query gateway;
  • team namespaces for XTrinode, XTrinodeCatalog, secrets, runtime pods, KEDA objects, and optional node-pool ownership objects.