Getting Started
Start with the path that matches what you are trying to prove.
1. Understand The Runtime Model
Section titled “1. Understand The Runtime Model”An XTrinode resource represents a Trino compute runtime. It is not a new SQL
engine. It is the platform-owned identity around a Trino coordinator, workers,
catalog selection, routing, lifecycle, scaling, and status.
The practical model is:
- platform teams install the control plane once;
- workload teams request named runtimes;
- the operator reconciles Kubernetes resources;
- the gateway gives users a stable Trino endpoint;
- the API server coordinates lifecycle actions such as resume and suspend.
2. Try XTrinode Locally
Section titled “2. Try XTrinode Locally”Use the local k3d and Tilt workflow from the main XTrinode repository when you want to see the operator, gateway, API server, KEDA, Redis, and a real Trino deployment working together.
The local path is best for:
- understanding reconciliation behavior;
- testing gateway routing and resume behavior;
- iterating on operator or gateway code;
- validating example manifests before moving to cloud infrastructure.
3. Install The Platform
Section titled “3. Install The Platform”Use the umbrella Helm chart to install the control-plane components and custom resource definitions into Kubernetes:
xtrinode-operatorfor reconciliation;xtrinode-api-serverfor lifecycle operations;xtrinode-gatewayfor Trino query routing;- CRDs for
XTrinodeandXTrinodeCatalog; - optional KEDA resources.
Keep chart versions and component image versions aligned for the release you deploy. Control-plane images are published through GHCR. See Versioning and Packaging.
4. Request A Runtime
Section titled “4. Request A Runtime”Create an XTrinode custom resource in the namespace owned by the team or
workload. Start with a small runtime and add catalogs, routing, KEDA, or a
node-pool spec when those boundaries are needed.
apiVersion: analytics.xtrinode.io/v1kind: XTrinodemetadata: name: analytics namespace: team-aspec: size: s minWorkers: 0 maxWorkers: 2 suspended: false routing: hostnameDomain: trino.example.comThe size field selects the starting machine and pod-resource profile. Use it
as the first sizing decision before reaching for low-level overrides.
Common first choices:
| Workload shape | Runtime shape |
|---|---|
| Small ad hoc SQL | size: s, fixed low worker count |
| Intermittent analytics | minWorkers: 0, KEDA query scaling |
| Team-isolated production runtime | namespace boundary, catalog selector, dedicated route |
| Cost or blast-radius isolation | optional spec.nodePool |
5. Attach Catalogs
Section titled “5. Attach Catalogs”Define XTrinodeCatalog resources for the data sources a team should query,
then select those catalogs from the runtime. Secrets remain Kubernetes secrets;
catalog resources reference them instead of embedding credentials.
spec: catalogSelector: matchLabels: team: analytics6. Route Queries
Section titled “6. Route Queries”Connect through the gateway by hostname, X-Trino-XTrinode header, or a default
route. Dedicated runtimes normally get dedicated routing groups. Shared pools can
load-balance multiple runtime backends.
Hostname routing is the cleanest user experience when you can create DNS names:
spec: routing: hostnameDomain: trino.example.comThat gives a stable runtime hostname such as:
trino --server https://team-a--analytics.trino.example.comThe generated hostname uses the routing group. For a dedicated runtime without
an explicit routingGroup, the operator derives namespace--name.
Header routing is useful when several runtimes share the same gateway hostname:
spec: routing: header: X-Trino-XTrinode=team-a/analyticsThen connect through the shared gateway endpoint:
trino \ --server https://trino-gateway.example.com \ --http-header "X-Trino-XTrinode: team-a/analytics"Shared pools use one routingGroup across multiple runtimes:
spec: routing: routingGroup: shared-analyticsUse shared pools when several compatible runtimes can load-balance query traffic. Use dedicated routing when a team or workload needs clear ownership, isolation, or troubleshooting boundaries.
7. Pick A Deployment Path
Section titled “7. Pick A Deployment Path”For the current platform positioning:
- use GCP/GKE/CAPG when you want the fully exercised checked-in cloud path;
- use generic Kubernetes when you already have a cluster and registry access;
- use AWS or Azure when you are ready to work with the provider-specific Terraform and registry/deploy material while live CAPA/CAPZ parity continues to mature.
| Path | Use when | Start here |
|---|---|---|
| Local k3d/Tilt | You want to validate operator, gateway, API server, KEDA, Redis, and real Trino behavior quickly. | Main repository local development workflow |
| Generic Kubernetes | You already have cluster, registry, ingress, DNS, and Helm ownership. | Deployment |
| GCP/GKE/CAPG | You want the fully exercised cloud path with Terraform, GKE, Artifact Registry, CAPG, runtime node pools, and KEDA/resume smoke coverage. | GCP |
| AWS/EKS/CAPA | You want the experimental provider-validation path with checked-in Terraform, registry/deploy material, and tested provider resource generation. | AWS |
| Azure/AKS/CAPZ | You want the experimental provider-validation path with checked-in Terraform, registry/deploy material, and tested provider resource generation. | Azure |
Current component image tags are listed in Versioning.
For a first GCP deployment from the main repository, use the ordered from-scratch sequence:
make gcp-management-upmake gcp-images-pushmake gcp-control-plane-deploymake gcp-management-up creates the GKE management cluster first, configures
kubectl, then applies the Kubernetes and cloud add-ons. If the GKE
infrastructure already exists and images are already pushed, make deploy-gcp
is fine for redeploying the control plane.
8. Add Elastic Scaling And Machine Sizing
Section titled “8. Add Elastic Scaling And Machine Sizing”Start with fixed workers, then enable KEDA once you have Prometheus or another scaling signal ready.
spec: minWorkers: 0 maxWorkers: 8 keda: enabled: true scalerType: prometheus scalingMetric: query threshold: "1" prometheusServer: http://prometheus-operated.monitoring.svc.cluster.local:9090When you need node-level isolation or chargeback on GCP, add a managed node pool spec. The runtime remains the user-facing unit; the node pool is just the capacity boundary behind it.
spec: size: s nodePool: name: analytics-nodes provider: gcp providerMode: managed clusterName: xtrinode-gke-test kubernetesVersion: v1.35.3 minNodes: 0 maxNodes: 10 gcp: machineType: e2-standard-4Read Scaling when choosing between fixed workers, KEDA query
scaling, scale-from-zero, warm floors, and node-pool behavior. Read
Sizing and overrides before using valuesOverlay or
changing provider machine types.