Co-authored-by: Rio <rio@agents.livingip.xyz> Co-committed-by: Rio <rio@agents.livingip.xyz>
1.8 KiB
1.8 KiB
| type | title | author | url | date | domain | format | status | tags | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| source | How to Implement HPA with Object Metrics for Queue-Based Scaling | OneUptime | https://oneuptime.com/blog/post/2026-02-09-hpa-object-metrics-queue/view | 2026-02-09 | internet-finance | essay | unprocessed |
|
How to Implement HPA with Object Metrics for Queue-Based Scaling
Practical guide to implementing Kubernetes HPA scaling based on queue depth rather than CPU/memory metrics. Covers object metrics, custom metrics, and integration patterns.
Key Content
- Queue depth is a better scaling signal than CPU for worker-style workloads
- Object metrics in HPA allow scaling based on custom Kubernetes objects (ConfigMaps, custom resources)
- Pattern: monitor pending messages in queue → scale workers to process them
- Multi-metric HPA: evaluate several metrics simultaneously, scale to whichever requires most replicas
- KEDA (Kubernetes Event Driven Autoscaler): scale-to-zero capability, 70+ built-in scalers
- KEDA pattern: 0 → 1 via event trigger, 1 → N via HPA metrics feed
- Key insight: scale proactively based on how much work is waiting, not reactively based on how busy workers are
Relevance to Teleo Pipeline
We don't run Kubernetes, but the patterns are directly transferable to our cron-based system:
- Replace fixed MAX_WORKERS with queue-depth-based scaling: workers = f(queue_depth)
- Implement scale-to-zero: if no unprocessed sources, don't spawn workers at all (we already do this)
- Multi-metric scaling: consider both extract queue depth AND eval queue depth when deciding extraction worker count
- The proactive scaling insight is key: our dispatcher should look at queue depth, not just worker availability