Rio 97eb04bd35 rio: research pipeline scaling disciplines (#630 )

Co-authored-by: Rio <rio@agents.livingip.xyz>
Co-committed-by: Rio <rio@agents.livingip.xyz>

2026-03-12 03:54:18 +00:00

1.8 KiB

Raw Blame History

type

title

author

url

date

domain

format

status

How to Implement HPA with Object Metrics for Queue-Based Scaling

Practical guide to implementing Kubernetes HPA scaling based on queue depth rather than CPU/memory metrics. Covers object metrics, custom metrics, and integration patterns.

Key Content

Queue depth is a better scaling signal than CPU for worker-style workloads
Object metrics in HPA allow scaling based on custom Kubernetes objects (ConfigMaps, custom resources)
Pattern: monitor pending messages in queue → scale workers to process them
Multi-metric HPA: evaluate several metrics simultaneously, scale to whichever requires most replicas
KEDA (Kubernetes Event Driven Autoscaler): scale-to-zero capability, 70+ built-in scalers
KEDA pattern: 0 → 1 via event trigger, 1 → N via HPA metrics feed
Key insight: scale proactively based on how much work is waiting, not reactively based on how busy workers are

Relevance to Teleo Pipeline

We don't run Kubernetes, but the patterns are directly transferable to our cron-based system:

Replace fixed MAX_WORKERS with queue-depth-based scaling: workers = f(queue_depth)
Implement scale-to-zero: if no unprocessed sources, don't spawn workers at all (we already do this)
Multi-metric scaling: consider both extract queue depth AND eval queue depth when deciding extraction worker count
The proactive scaling insight is key: our dispatcher should look at queue depth, not just worker availability

1.8 KiB Raw Blame History

How to Implement HPA with Object Metrics for Queue-Based Scaling

Key Content

Relevance to Teleo Pipeline

1.8 KiB

Raw Blame History