Scaling

XTrinode has several scaling layers. Keep them separate when designing a deployment:

Layer	What scales	Main mechanism
Runtime workers	Trino worker pods for one `XTrinode` runtime	Fixed replicas or KEDA
Runtime nodes	Optional dedicated Kubernetes nodes for one runtime	`spec.nodePool` and Cluster API
Gateway	Query proxy replicas	Helm HPA or KEDA values
API server	Lifecycle API replicas	Helm HPA or KEDA values

Fixed Workers

Fixed workers are the simplest mode. Use this when a runtime has predictable load or when you are validating the platform.

spec:
  minWorkers: 2
  maxWorkers: 2

The operator reconciles coordinator and worker resources directly. No KEDA ScaledObject is required for runtime workers in this mode.

Query-Based KEDA

For elastic runtimes, KEDA can scale workers from query pressure. The preferred Prometheus signal is emitted by the gateway:

sum(xtrinode_gateway_inflight_queries{exported_namespace="<namespace>",xtrinode="<name>"}) or sum(xtrinode_gateway_inflight_queries{namespace="<namespace>",xtrinode="<name>"})

That signal is useful for scale-from-zero because the gateway can observe query demand before worker pods exist.

spec:
  minWorkers: 0
  maxWorkers: 8
  keda:
    enabled: true
    scalerType: prometheus
    scalingMetric: query
    threshold: "1"
    prometheusServer: http://prometheus-operated.monitoring.svc.cluster.local:9090

You can override spec.keda.prometheusQuery for a custom scaler. The operator supports placeholders such as {namespace}, {releaseName}, and {xtrinodeName} before writing the KEDA trigger.

HTTP And JMX Scaling

HTTP scaling can be used as a fallback. For query scaling, the default coordinator endpoint inspects Trino query state. JMX exporter can also be enabled when you want JVM or Trino metrics from a sidecar.

spec:
  keda:
    enabled: true
    scalerType: http
    scalingMetric: query
    httpEndpoint: coordinator

KEDA is opt-in. If spec.keda.enabled is missing, false, or true without a usable scaler configuration, the operator keeps workers fixed and removes stale KEDA objects. For gateway and API server charts, keda.enabled=true also needs at least one enabled trigger, such as cpu, memory, prometheus, http, or a custom trigger, before a ScaledObject is rendered.

Suspend, Resume, And Warm Floors

Runtimes can suspend compute while preserving the desired runtime identity.

spec:
  suspended: false
  autoSuspendAfter: 30m
  wakeMinWorkers: 1
  wakeTTL: 10m

autoSuspendAfter lets the platform pause idle runtimes.
wakeMinWorkers prewarms workers after resume.
wakeTTL keeps the prewarm floor briefly after demand returns.
Resume requests are coordinated by the API server with Kubernetes Lease objects so many first queries do not trigger duplicate resume work.

Runtime Node Pools

For stronger isolation, a runtime can request an optional provider node pool. GCP/GKE/CAPG is the fully exercised checked-in cloud path for this behavior. AWS and Azure provider resource generation exists in the source tree, while live CAPA/CAPZ parity remains tracked separately.

spec:
  nodePool:
    name: analytics-nodes
    provider: gcp
    providerMode: managed
    clusterName: xtrinode-gke-test
    kubernetesVersion: v1.35.3
    minNodes: 0
    maxNodes: 10
    nodeLabels:
      xtrinode.io/runtime: analytics
      xtrinode.io/node-pool: analytics-nodes
    gcp:
      machineType: e2-standard-4

When scaleDownOnSuspend is enabled, node-pool minimums can be lowered during suspend and restored on resume. Use this only after validating Cluster Autoscaler and provider quotas for the environment.

Gateway And API Server Replicas

The gateway and API server charts support HPA and KEDA style autoscaling. Choose one autoscaling controller per Deployment; do not enable both for the same component.

Use HPA when CPU or memory signals are enough:

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70

Use KEDA when you need Prometheus, HTTP, cooldown, fallback, or event-driven signals:

keda:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  prometheus:
    enabled: true
    serverAddress: http://prometheus.monitoring.svc.cluster.local:9090

HPA requires Kubernetes Metrics Server and resource requests on the target Deployment. KEDA CPU and memory scalers also need Metrics Server; Prometheus and HTTP scalers need the referenced endpoints to be reachable from the KEDA operator.

The control-plane charts also expose advanced autoscaling knobs for production tuning. HPA can use additional metrics and scale-up or scale-down behavior fields, including stabilization windows and pod or percentage policies. KEDA can use custom triggers, polling and cooldown intervals, and fallback replica counts when the metrics source is unavailable.

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70
  scaleDownStabilizationWindowSeconds: 300
  scaleUpStabilizationWindowSeconds: 0

keda:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  pollingInterval: 30
  cooldownPeriod: 300
  fallback:
    failureThreshold: 3
    replicas: 2

Native HPA for Trino workers is available only through the privileged valuesOverlay.server.autoscaling escape hatch. Prefer typed runtime KEDA settings unless a platform owner has a specific reason to use worker HPA.

Scaling Checks

kubectl get xtrinode -A
kubectl describe xtrinode analytics -n team-a
kubectl get scaledobject -A
kubectl describe scaledobject xtrinode-analytics-workers -n team-a
kubectl get hpa -A
kubectl get events -n team-a --sort-by='.lastTimestamp'

For node-pool issues, also check:

kubectl get machinepool -A
kubectl describe machinepool analytics-nodes -n team-a
kubectl get nodes -L xtrinode.io/node-pool,xtrinode.io/runtime