Architecture

XTrinode has a control plane that reconciles runtime intent and a query plane that routes SQL traffic to the right Trino coordinator.

Architecture Diagram

flowchart TB clients["Trino CLI, BI tools, notebooks, applications"] ingress["Ingress or internal Service"] gateway["XTrinode Gateway routing, auth, rate limits, sticky queries"] api["API Server resume, suspend, status, metrics"] operator["Operator XTrinode and XTrinodeCatalog reconciliation"] kube["Kubernetes API"] routes["Gateway routes ConfigMap"] catalogs["XTrinodeCatalog resources catalog ConfigMaps and Secrets"] keda["KEDA optional worker autoscaling"] subgraph runtimes["Runtime namespaces"] runtimeA["XTrinode A Trino coordinator and workers"] runtimeB["XTrinode B Trino coordinator and workers"] shared["Shared routing group multiple Trino backends"] end clients --> ingress --> gateway gateway --> runtimeA gateway --> runtimeB gateway --> shared gateway --> api api --> kube operator --> kube kube --> routes --> gateway catalogs --> operator kube --> runtimeA kube --> runtimeB kube --> shared keda --> runtimeA keda --> runtimeB

Component Overview

Component	Responsibility
Operator	Reconciles `XTrinode` and `XTrinodeCatalog` resources into Kubernetes resources.
API server	Coordinates runtime lifecycle operations such as resume, suspend, status, and control-plane actions.
Gateway	Routes Trino client traffic by hostname, `X-Trino-XTrinode` header, shared pool, or default route.
KEDA	Optionally scales worker pools from configured metrics.
Cluster API providers	Optionally create per-runtime cloud node pools.
XTrinodeCatalog	Keeps catalog declarations separate from runtime lifecycle.

Query Plane

The gateway is the Trino-facing entrypoint. It can sit behind an ingress for external traffic or be reached as an internal Kubernetes service. It is not an Ingress controller; cloud load balancers, DNS, and TLS edge policy stay outside the gateway.

It provides:

hostname-based routing;
X-Trino-XTrinode header routing;
default-route fallback when no explicit selector is provided;
shared-pool load balancing;
sticky query routing;
optional authentication and rate limiting;
active health checks and circuit breaking;
auto-resume when a selected backend is paused or unavailable.

The operator writes route entries to the trino-gateway-routes ConfigMap in the gateway namespace. Health and metrics endpoints bypass gateway auth and rate limiting so platform probes keep working.

Control Plane

The operator is the reconciliation owner. It resolves catalogs, applies runtime guardrails, renders Trino resources, configures scaling, registers gateway routes, evaluates lifecycle state, and updates status.

The API server handles lifecycle requests that require coordination. Resume requests use Kubernetes Lease objects so many simultaneous first queries do not stampede the control plane.

When a gateway resume call wins the lease, the API server records resume intent on the runtime and returns retry guidance. If another caller already holds the lease, later callers get retry guidance instead of triggering another resume.

Runtime Lifecycle

Runtimes move through a declarative lifecycle:

Pending -> Reconciling -> Ready -> Suspending -> Suspended -> Resuming -> Reconciling -> Ready

The expanded operating model also includes route draining, runtime readiness, resume leases, and finalizer-backed cleanup.

State	Meaning
Pending	A runtime resource exists and needs initial reconciliation.
Reconciling	Kubernetes resources, catalogs, routes, and scaling objects are being applied.
Ready	The coordinator is reachable and the gateway can send new queries.
Suspending	The controller is enforcing suspended invariants.
Suspended	Intent or idle policy has paused compute.
Resuming	Demand or an explicit command is bringing compute back.
Error	Reconciliation or lifecycle control failed and needs operator attention.

Deletion is handled by finalizers and route cleanup, but it is not a status.phase value.

Gateway route state is more specific than the high-level runtime phase:

Route state	Meaning
`RUNNING`	New queries can be routed to the backend.
`RESUMING`	Resume is in progress; clients should retry later.
`PAUSED`	Compute is intentionally unavailable.
`DRAINING`	Existing sticky queries can continue, but new queries should avoid the backend.
`REMOVED`	The backend has been deregistered.

Runtime Reconciliation

For create and update operations, the operator:

Reads the XTrinode spec.
Resolves selected XTrinodeCatalog resources.
Extracts secret references for catalog credentials.
Applies namespace guardrails before runtime resources.
Waits for requested node-pool readiness before scheduling Trino pods.
Applies services, config maps, service accounts, coordinator, workers, and optional monitoring.
Applies wake TTL and fixed-worker or KEDA scaling resources.
Publishes gateway route state as RESUMING until runtime readiness passes, then switches the route to RUNNING and updates status.

Catalog Flow

Catalogs are separated from runtimes so data-source definitions can be reused across teams and compute units.

Secret
  -> XTrinodeCatalog
  -> catalog ConfigMap with secret-backed placeholders
  -> selected by XTrinode runtime
  -> mounted into Trino coordinator and workers

Scaling Model

XTrinode supports three worker modes:

fixed worker counts for predictable small deployments;
KEDA-managed worker pools for dynamic runtime capacity.
native HPA-managed workers through the privileged valuesOverlay.server.autoscaling escape hatch.

KEDA can react to metrics, including Prometheus-backed query pressure when that integration is configured. Gateway-observed query pressure is useful for scale-from-zero because worker metrics do not exist while workers are already at zero.

Failure Boundaries

Failure or condition	Expected containment
One runtime overloads	Gateway routing, worker limits, backend state, and namespace resources isolate other runtimes.
First query hits a suspended runtime	Gateway asks the API server to resume and returns retry guidance.
Many clients trigger resume together	API server lease gating lets one resume operation win.
Runtime is not ready yet	Operator keeps the backend out of normal routing until readiness passes.
Cloud node pool cannot provision	XTrinode status and Kubernetes events expose scheduling and provider failures.
Gateway route ConfigMap has invalid YAML or no valid entries	Gateway keeps the last-good in-memory routes instead of replacing them with invalid state.
Delete is interrupted	Finalizers and route deregistration let reconciliation resume cleanup.

Platform Namespaces

A common deployment separates:

xtrinode-system for the operator and API server;
xtrinode-gateway for the query gateway;
team namespaces for XTrinode, XTrinodeCatalog, secrets, runtime pods, KEDA objects, and optional node-pool ownership objects.