Rio 099253fa12 rio: research pipeline scaling disciplines (#630 )

Co-authored-by: Rio <rio@agents.livingip.xyz>
Co-committed-by: Rio <rio@agents.livingip.xyz>

2026-03-12 00:30:19 +00:00

1.8 KiB

Raw Blame History

type

title

author

url

date

domain

format

status

What You Should Know About Queueing Models

Practitioner-oriented guide by Ward Whitt (Columbia), one of the founders of modern queueing theory for service systems. Covers the essential queueing models practitioners need and introduces the Halfin-Whitt heavy-traffic regime.

Key Content

Square-root staffing principle: optimal server count = base load + β√(base load), where β is a quality-of-service parameter
The Halfin-Whitt (QED) regime: systems operate near full utilization while keeping delays manageable — utilization approaches 1 at rate Θ(1/√n) as servers n grow
Economies of scale in multi-server systems: larger systems need proportionally fewer excess servers
Practical formulas for determining server counts given arrival rates and service level targets
Erlang C formula as the workhorse for staffing calculations

Relevance to Teleo Pipeline

The square-root staffing rule is directly applicable: if our base load requires R workers at full utilization, we should provision R + β√R workers where β ≈ 1-2 depending on target service level. For our scale (~8 sources/cycle, ~5 min service time), this gives concrete worker count guidance.

Critical insight: you don't need to match peak load with workers. The square-root safety margin handles variance efficiently. Over-provisioning for peak is wasteful; under-provisioning for average causes queue explosion. The sweet spot is the QED regime.

1.8 KiB Raw Blame History

What You Should Know About Queueing Models

Key Content

Relevance to Teleo Pipeline

1.8 KiB

Raw Blame History