Skip to content

Scaling

XTrinode has several scaling layers. Keep them separate when designing a deployment:

LayerWhat scalesMain mechanism
Runtime workersTrino worker pods for one XTrinode runtimeFixed replicas or KEDA
Runtime nodesOptional dedicated Kubernetes nodes for one runtimespec.nodePool and Cluster API
GatewayQuery proxy replicasHelm HPA or KEDA values
API serverLifecycle API replicasHelm HPA or KEDA values

Fixed workers are the simplest mode. Use this when a runtime has predictable load or when you are validating the platform.

spec:
minWorkers: 2
maxWorkers: 2

The operator reconciles coordinator and worker resources directly. No KEDA ScaledObject is required for runtime workers in this mode.

For elastic runtimes, KEDA can scale workers from query pressure. The preferred Prometheus signal is emitted by the gateway:

sum(xtrinode_gateway_inflight_queries{exported_namespace="<namespace>",xtrinode="<name>"}) or sum(xtrinode_gateway_inflight_queries{namespace="<namespace>",xtrinode="<name>"})

That signal is useful for scale-from-zero because the gateway can observe query demand before worker pods exist.

spec:
minWorkers: 0
maxWorkers: 8
keda:
enabled: true
scalerType: prometheus
scalingMetric: query
threshold: "1"
prometheusServer: http://prometheus-operated.monitoring.svc.cluster.local:9090

You can override spec.keda.prometheusQuery for a custom scaler. The operator supports placeholders such as {namespace}, {releaseName}, and {xtrinodeName} before writing the KEDA trigger.

HTTP scaling can be used as a fallback. For query scaling, the default coordinator endpoint inspects Trino query state. JMX exporter can also be enabled when you want JVM or Trino metrics from a sidecar.

spec:
keda:
enabled: true
scalerType: http
scalingMetric: query
httpEndpoint: coordinator

KEDA is opt-in. If spec.keda.enabled is missing, false, or true without a usable scaler configuration, the operator keeps workers fixed and removes stale KEDA objects. For gateway and API server charts, keda.enabled=true also needs at least one enabled trigger, such as cpu, memory, prometheus, http, or a custom trigger, before a ScaledObject is rendered.

Runtimes can suspend compute while preserving the desired runtime identity.

spec:
suspended: false
autoSuspendAfter: 30m
wakeMinWorkers: 1
wakeTTL: 10m
  • autoSuspendAfter lets the platform pause idle runtimes.
  • wakeMinWorkers prewarms workers after resume.
  • wakeTTL keeps the prewarm floor briefly after demand returns.
  • Resume requests are coordinated by the API server with Kubernetes Lease objects so many first queries do not trigger duplicate resume work.

For stronger isolation, a runtime can request an optional provider node pool. GCP/GKE/CAPG is the fully exercised checked-in cloud path for this behavior. AWS and Azure provider resource generation exists in the source tree, while live CAPA/CAPZ parity remains tracked separately.

spec:
nodePool:
name: analytics-nodes
provider: gcp
providerMode: managed
clusterName: xtrinode-gke-test
kubernetesVersion: v1.35.3
minNodes: 0
maxNodes: 10
nodeLabels:
xtrinode.io/runtime: analytics
xtrinode.io/node-pool: analytics-nodes
gcp:
machineType: e2-standard-4

When scaleDownOnSuspend is enabled, node-pool minimums can be lowered during suspend and restored on resume. Use this only after validating Cluster Autoscaler and provider quotas for the environment.

The gateway and API server charts support HPA and KEDA style autoscaling. Choose one autoscaling controller per Deployment; do not enable both for the same component.

Use HPA when CPU or memory signals are enough:

autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 70

Use KEDA when you need Prometheus, HTTP, cooldown, fallback, or event-driven signals:

keda:
enabled: true
minReplicas: 2
maxReplicas: 10
prometheus:
enabled: true
serverAddress: http://prometheus.monitoring.svc.cluster.local:9090

HPA requires Kubernetes Metrics Server and resource requests on the target Deployment. KEDA CPU and memory scalers also need Metrics Server; Prometheus and HTTP scalers need the referenced endpoints to be reachable from the KEDA operator.

The control-plane charts also expose advanced autoscaling knobs for production tuning. HPA can use additional metrics and scale-up or scale-down behavior fields, including stabilization windows and pod or percentage policies. KEDA can use custom triggers, polling and cooldown intervals, and fallback replica counts when the metrics source is unavailable.

autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 70
scaleDownStabilizationWindowSeconds: 300
scaleUpStabilizationWindowSeconds: 0
keda:
enabled: true
minReplicas: 2
maxReplicas: 10
pollingInterval: 30
cooldownPeriod: 300
fallback:
failureThreshold: 3
replicas: 2

Native HPA for Trino workers is available only through the privileged valuesOverlay.server.autoscaling escape hatch. Prefer typed runtime KEDA settings unless a platform owner has a specific reason to use worker HPA.

Terminal window
kubectl get xtrinode -A
kubectl describe xtrinode analytics -n team-a
kubectl get scaledobject -A
kubectl describe scaledobject xtrinode-analytics-workers -n team-a
kubectl get hpa -A
kubectl get events -n team-a --sort-by='.lastTimestamp'

For node-pool issues, also check:

Terminal window
kubectl get machinepool -A
kubectl describe machinepool analytics-nodes -n team-a
kubectl get nodes -L xtrinode.io/node-pool,xtrinode.io/runtime