Scaling
XTrinode has several scaling layers. Keep them separate when designing a deployment:
| Layer | What scales | Main mechanism |
|---|---|---|
| Runtime workers | Trino worker pods for one XTrinode runtime | Fixed replicas or KEDA |
| Runtime nodes | Optional dedicated Kubernetes nodes for one runtime | spec.nodePool and Cluster API |
| Gateway | Query proxy replicas | Helm HPA or KEDA values |
| API server | Lifecycle API replicas | Helm HPA or KEDA values |
Fixed Workers
Section titled “Fixed Workers”Fixed workers are the simplest mode. Use this when a runtime has predictable load or when you are validating the platform.
spec: minWorkers: 2 maxWorkers: 2The operator reconciles coordinator and worker resources directly. No KEDA
ScaledObject is required for runtime workers in this mode.
Query-Based KEDA
Section titled “Query-Based KEDA”For elastic runtimes, KEDA can scale workers from query pressure. The preferred Prometheus signal is emitted by the gateway:
sum(xtrinode_gateway_inflight_queries{exported_namespace="<namespace>",xtrinode="<name>"}) or sum(xtrinode_gateway_inflight_queries{namespace="<namespace>",xtrinode="<name>"})That signal is useful for scale-from-zero because the gateway can observe query demand before worker pods exist.
spec: minWorkers: 0 maxWorkers: 8 keda: enabled: true scalerType: prometheus scalingMetric: query threshold: "1" prometheusServer: http://prometheus-operated.monitoring.svc.cluster.local:9090You can override spec.keda.prometheusQuery for a custom scaler. The operator
supports placeholders such as {namespace}, {releaseName}, and
{xtrinodeName} before writing the KEDA trigger.
HTTP And JMX Scaling
Section titled “HTTP And JMX Scaling”HTTP scaling can be used as a fallback. For query scaling, the default coordinator endpoint inspects Trino query state. JMX exporter can also be enabled when you want JVM or Trino metrics from a sidecar.
spec: keda: enabled: true scalerType: http scalingMetric: query httpEndpoint: coordinatorKEDA is opt-in. If spec.keda.enabled is missing, false, or true without a
usable scaler configuration, the operator keeps workers fixed and removes stale
KEDA objects. For gateway and API server charts, keda.enabled=true also needs
at least one enabled trigger, such as cpu, memory, prometheus, http, or
a custom trigger, before a ScaledObject is rendered.
Suspend, Resume, And Warm Floors
Section titled “Suspend, Resume, And Warm Floors”Runtimes can suspend compute while preserving the desired runtime identity.
spec: suspended: false autoSuspendAfter: 30m wakeMinWorkers: 1 wakeTTL: 10mautoSuspendAfterlets the platform pause idle runtimes.wakeMinWorkersprewarms workers after resume.wakeTTLkeeps the prewarm floor briefly after demand returns.- Resume requests are coordinated by the API server with Kubernetes
Leaseobjects so many first queries do not trigger duplicate resume work.
Runtime Node Pools
Section titled “Runtime Node Pools”For stronger isolation, a runtime can request an optional provider node pool. GCP/GKE/CAPG is the fully exercised checked-in cloud path for this behavior. AWS and Azure provider resource generation exists in the source tree, while live CAPA/CAPZ parity remains tracked separately.
spec: nodePool: name: analytics-nodes provider: gcp providerMode: managed clusterName: xtrinode-gke-test kubernetesVersion: v1.35.3 minNodes: 0 maxNodes: 10 nodeLabels: xtrinode.io/runtime: analytics xtrinode.io/node-pool: analytics-nodes gcp: machineType: e2-standard-4When scaleDownOnSuspend is enabled, node-pool minimums can be lowered during
suspend and restored on resume. Use this only after validating Cluster
Autoscaler and provider quotas for the environment.
Gateway And API Server Replicas
Section titled “Gateway And API Server Replicas”The gateway and API server charts support HPA and KEDA style autoscaling. Choose one autoscaling controller per Deployment; do not enable both for the same component.
Use HPA when CPU or memory signals are enough:
autoscaling: enabled: true minReplicas: 2 maxReplicas: 10 targetCPUUtilizationPercentage: 70Use KEDA when you need Prometheus, HTTP, cooldown, fallback, or event-driven signals:
keda: enabled: true minReplicas: 2 maxReplicas: 10 prometheus: enabled: true serverAddress: http://prometheus.monitoring.svc.cluster.local:9090HPA requires Kubernetes Metrics Server and resource requests on the target Deployment. KEDA CPU and memory scalers also need Metrics Server; Prometheus and HTTP scalers need the referenced endpoints to be reachable from the KEDA operator.
The control-plane charts also expose advanced autoscaling knobs for production tuning. HPA can use additional metrics and scale-up or scale-down behavior fields, including stabilization windows and pod or percentage policies. KEDA can use custom triggers, polling and cooldown intervals, and fallback replica counts when the metrics source is unavailable.
autoscaling: enabled: true minReplicas: 2 maxReplicas: 10 targetCPUUtilizationPercentage: 70 scaleDownStabilizationWindowSeconds: 300 scaleUpStabilizationWindowSeconds: 0keda: enabled: true minReplicas: 2 maxReplicas: 10 pollingInterval: 30 cooldownPeriod: 300 fallback: failureThreshold: 3 replicas: 2Native HPA for Trino workers is available only through the privileged
valuesOverlay.server.autoscaling escape hatch. Prefer typed runtime KEDA
settings unless a platform owner has a specific reason to use worker HPA.
Scaling Checks
Section titled “Scaling Checks”kubectl get xtrinode -Akubectl describe xtrinode analytics -n team-akubectl get scaledobject -Akubectl describe scaledobject xtrinode-analytics-workers -n team-akubectl get hpa -Akubectl get events -n team-a --sort-by='.lastTimestamp'For node-pool issues, also check:
kubectl get machinepool -Akubectl describe machinepool analytics-nodes -n team-akubectl get nodes -L xtrinode.io/node-pool,xtrinode.io/runtime