teleo-codex/inbox/archive/2026-02-09-oneuptime-hpa-object-metrics-queue-scaling.md
Teleo Agents 34dd5bf93d extract: 2026-02-09-oneuptime-hpa-object-metrics-queue-scaling
Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>
2026-03-16 13:33:14 +00:00

44 lines
2.5 KiB
Markdown

---
type: source
title: "How to Implement HPA with Object Metrics for Queue-Based Scaling"
author: "OneUptime"
url: https://oneuptime.com/blog/post/2026-02-09-hpa-object-metrics-queue/view
date: 2026-02-09
domain: internet-finance
format: essay
status: enrichment
tags: [pipeline-architecture, kubernetes, autoscaling, queue-based-scaling, KEDA, HPA]
processed_by: rio
processed_date: 2026-03-16
enrichments_applied: ["time-varying-arrival-rates-require-dynamic-staffing-not-constant-max-workers.md", "aimd-scaling-solves-variable-load-expensive-compute-coordination-without-prediction.md"]
extraction_model: "anthropic/claude-sonnet-4.5"
---
# How to Implement HPA with Object Metrics for Queue-Based Scaling
Practical guide to implementing Kubernetes HPA scaling based on queue depth rather than CPU/memory metrics. Covers object metrics, custom metrics, and integration patterns.
## Key Content
- Queue depth is a better scaling signal than CPU for worker-style workloads
- Object metrics in HPA allow scaling based on custom Kubernetes objects (ConfigMaps, custom resources)
- Pattern: monitor pending messages in queue → scale workers to process them
- Multi-metric HPA: evaluate several metrics simultaneously, scale to whichever requires most replicas
- KEDA (Kubernetes Event Driven Autoscaler): scale-to-zero capability, 70+ built-in scalers
- KEDA pattern: 0 → 1 via event trigger, 1 → N via HPA metrics feed
- Key insight: scale proactively based on how much work is waiting, not reactively based on how busy workers are
## Relevance to Teleo Pipeline
We don't run Kubernetes, but the patterns are directly transferable to our cron-based system:
1. Replace fixed MAX_WORKERS with queue-depth-based scaling: workers = f(queue_depth)
2. Implement scale-to-zero: if no unprocessed sources, don't spawn workers at all (we already do this)
3. Multi-metric scaling: consider both extract queue depth AND eval queue depth when deciding extraction worker count
4. The proactive scaling insight is key: our dispatcher should look at queue depth, not just worker availability
## Key Facts
- KEDA (Kubernetes Event Driven Autoscaler) supports 70+ built-in scalers for different event sources
- KEDA implements scale-to-zero capability: 0→1 replicas via event trigger, 1→N replicas via HPA metrics
- HPA object metrics allow scaling based on custom Kubernetes objects like ConfigMaps and custom resources
- Multi-metric HPA evaluates several metrics simultaneously and scales to whichever requires the most replicas