rio: research pipeline scaling disciplines #630
16 changed files with 817 additions and 0 deletions
378
agents/rio/musings/research-pipeline-scaling.md
Normal file
378
agents/rio/musings/research-pipeline-scaling.md
Normal file
|
|
@ -0,0 +1,378 @@
|
||||||
|
---
|
||||||
|
type: musing
|
||||||
|
agent: rio
|
||||||
|
title: "Pipeline scaling architecture: queueing theory, backpressure, and optimal worker provisioning"
|
||||||
|
status: developing
|
||||||
|
created: 2026-03-12
|
||||||
|
updated: 2026-03-12
|
||||||
|
tags: [pipeline-architecture, operations-research, queueing-theory, mechanism-design, infrastructure]
|
||||||
|
---
|
||||||
|
|
||||||
|
# Pipeline Scaling Architecture: What Operations Research Tells Us
|
||||||
|
|
||||||
|
Research musing for Leo and Cory on how to optimally architect our three-stage pipeline (research → extract → eval) for variable-load scaling. Six disciplines investigated, each mapped to our specific system.
|
||||||
|
|
||||||
|
## Our System Parameters
|
||||||
|
|
||||||
|
Before diving into theory, let me nail down the numbers:
|
||||||
|
|
||||||
|
- **Arrival pattern**: Highly bursty. Research sessions dump 10-20 sources at once. Futardio launches come in bursts of 20+. Quiet periods produce 0-2 sources/day.
|
||||||
|
- **Extract stage**: 6 max workers, ~10-15 min per source (Claude compute). Dispatches every 5 min via cron.
|
||||||
|
- **Eval stage**: 5 max workers, ~5-15 min per PR (Claude compute). Dispatches every 5 min via cron.
|
||||||
|
- **Current architecture**: Fixed cron intervals, fixed worker caps, no backpressure, no priority queuing beyond basic triage (infra PRs first, then re-review, then fresh).
|
||||||
|
- **Cost model**: Workers are Claude Code sessions — expensive. Each idle worker costs nothing, but each active worker-minute is real money.
|
||||||
|
- **Queue sizes**: ~225 unprocessed sources, ~400 claims in KB.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Operations Research / Queueing Theory
|
||||||
|
|
||||||
|
### How it maps to our pipeline
|
||||||
|
|
||||||
|
Our pipeline is a **tandem queue** (also called a Jackson network): three stages in series, each with multiple servers. In queueing notation:
|
||||||
|
|
||||||
|
- **Extract stage**: M[t]/G/6 queue — time-varying arrivals (non-Poisson), general service times (extraction complexity varies), 6 servers
|
||||||
|
- **Eval stage**: M[t]/G/5 queue — arrivals are departures from extract (so correlated), general service times, 5 servers
|
||||||
|
|
||||||
|
The classic M/M/c model gives us closed-form results for steady-state behavior:
|
||||||
|
|
||||||
|
**Little's Law** (L = λW) is the foundation. If average arrival rate λ = 8 sources per 5-min cycle = 0.027/sec, and average extraction time W = 750 sec (12.5 min), then average sources in extract system L = 0.027 × 750 ≈ 20. With 6 workers, average utilization ρ = 20/6 ≈ 3.3 — meaning we'd need ~20 workers for steady state at this arrival rate. **This means our current MAX_WORKERS=6 for extraction is significantly undersized during burst periods.**
|
||||||
|
|
||||||
|
But bursts are temporary. During quiet periods, λ drops to near zero. The question isn't "how many workers for peak?" but "how do we adaptively size for current load?"
|
||||||
|
|
||||||
|
### Key insight: Square-root staffing
|
||||||
|
|
||||||
|
The **Halfin-Whitt regime** gives the answer: optimal workers = R + β√R, where R is the base load (λ/μ, arrival rate / service rate) and β ≈ 1-2 is a quality-of-service parameter.
|
||||||
|
|
||||||
|
For our system during a burst (λ = 20 sources in 5 min):
|
||||||
|
- R = 20 × (12.5 min / 5 min) = 50 source-slots needed → clearly impossible with 6 workers
|
||||||
|
- During burst: queue builds rapidly, workers drain it over subsequent cycles
|
||||||
|
- During quiet: R ≈ 0, workers = 0 + β√0 = 0 → don't spawn workers
|
||||||
|
|
||||||
|
The square-root staffing rule says: **don't size for peak. Size for current load plus a safety margin proportional to √(current load).** This is fundamentally different from our current fixed-cap approach.
|
||||||
|
|
||||||
|
### What to implement
|
||||||
|
|
||||||
|
**Phase 1 (now)**: Calculate ρ = queue_depth / (MAX_WORKERS × expected_service_time_in_cycles). If ρ > 1, system is overloaded — scale up or implement backpressure. Log this metric.
|
||||||
|
|
||||||
|
**Phase 2 (soon)**: Replace fixed MAX_WORKERS with dynamic: workers = min(ceil(queue_depth / sources_per_worker_per_cycle) + ceil(√(queue_depth)), HARD_MAX). This implements square-root staffing.
|
||||||
|
|
||||||
|
→ SOURCE: Bournassenko 2025, "On Queueing Theory for Large-Scale CI/CD Pipelines"
|
||||||
|
→ SOURCE: Whitt 2019, "What You Should Know About Queueing Models"
|
||||||
|
→ SOURCE: van Leeuwaarden et al. 2018, "Economies-of-Scale in Many-Server Queueing Systems" (SIAM Review)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Stochastic Modeling for Non-Stationary Arrivals
|
||||||
|
|
||||||
|
### How it maps to our pipeline
|
||||||
|
|
||||||
|
Our arrival process is a textbook **Markov-Modulated Poisson Process (MMPP)**. There's a hidden state governing the arrival rate:
|
||||||
|
|
||||||
|
| Hidden State | Arrival Rate | Duration |
|
||||||
|
|-------------|-------------|----------|
|
||||||
|
| Research session active | 10-20 sources/hour | 1-3 hours |
|
||||||
|
| Futardio launch burst | 20+ sources/dump | Minutes |
|
||||||
|
| Normal monitoring | 2-5 sources/day | Hours to days |
|
||||||
|
| Quiet period | 0-1 sources/day | Days |
|
||||||
|
|
||||||
|
The key finding from the literature: **replacing a time-varying arrival rate with a constant (average or max) leads to systems being badly understaffed or overstaffed.** This is exactly our problem. MAX_WORKERS=6 is undersized for bursts and oversized for quiet periods.
|
||||||
|
|
||||||
|
### The peakedness parameter
|
||||||
|
|
||||||
|
The **variance-to-mean ratio** (called "peakedness" or "dispersion ratio") of the arrival process determines how much extra capacity you need beyond standard queueing formulas:
|
||||||
|
|
||||||
|
- Peakedness = 1: Poisson process (standard formulas work)
|
||||||
|
- Peakedness > 1: Overdispersed/bursty (need MORE capacity than standard)
|
||||||
|
- Peakedness < 1: Underdispersed/smooth (need LESS capacity)
|
||||||
|
|
||||||
|
Our pipeline has peakedness >> 1 (highly bursty). The modified staffing formula adjusts the square-root safety margin by the peakedness factor. For bursty arrivals, the safety margin should be √(peakedness) × β√R instead of just β√R.
|
||||||
|
|
||||||
|
### Practical estimation
|
||||||
|
|
||||||
|
We can estimate peakedness empirically from our logs:
|
||||||
|
1. Count sources arriving per hour over the last 30 days
|
||||||
|
2. Calculate mean and variance of hourly arrival counts
|
||||||
|
3. Peakedness = variance / mean
|
||||||
|
|
||||||
|
If peakedness ≈ 5 (plausible given our burst pattern), we need √5 ≈ 2.2× the safety margin that standard Poisson models suggest.
|
||||||
|
|
||||||
|
### What to implement
|
||||||
|
|
||||||
|
**Phase 1**: Instrument arrival patterns. Log source arrivals per hour with timestamps. After 2 weeks, calculate peakedness.
|
||||||
|
|
||||||
|
**Phase 2**: Use the peakedness-adjusted staffing formula for worker provisioning. Different time windows may have different peakedness — weekdays vs. weekends, research-session hours vs. off-hours.
|
||||||
|
|
||||||
|
→ SOURCE: Whitt et al. 2016, "Staffing a Service System with Non-Poisson Non-Stationary Arrivals"
|
||||||
|
→ SOURCE: Liu et al. 2019, "Modeling and Simulation of Nonstationary Non-Poisson Arrival Processes" (CIATA method)
|
||||||
|
→ SOURCE: Simio/WinterSim 2018, "Resource Scheduling in Non-Stationary Service Systems"
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Combinatorial Optimization / Scheduling
|
||||||
|
|
||||||
|
### How it maps to our pipeline
|
||||||
|
|
||||||
|
Our pipeline is a **hybrid flow-shop**: three stages (research → extract → eval), multiple workers at each stage, all sources flow through the same stage sequence. This is important because:
|
||||||
|
|
||||||
|
- **Not a job-shop** (jobs don't have different stage orderings)
|
||||||
|
- **Not a simple flow-shop** (we have parallel workers within each stage)
|
||||||
|
- **Hybrid flow-shop with parallel machines per stage** — well-studied in OR literature
|
||||||
|
|
||||||
|
The key question: given heterogeneous sources (varying complexity, different domains, different agents), how do we assign sources to workers optimally?
|
||||||
|
|
||||||
|
### Surprising finding: simple dispatching rules work
|
||||||
|
|
||||||
|
For hybrid flow-shops with relatively few stages and homogeneous workers within each stage, **simple priority dispatching rules perform within 5-10% of optimal**. The NP-hardness of general JSSP is not relevant to our case because:
|
||||||
|
|
||||||
|
1. Our stages are fixed-order (not arbitrary routing)
|
||||||
|
2. Workers within a stage are roughly homogeneous (all Claude sessions)
|
||||||
|
3. We have few stages (3) and few workers (5-6 per stage)
|
||||||
|
4. We already have a natural priority ordering (infra > re-review > fresh)
|
||||||
|
|
||||||
|
The best simple rules for our setting:
|
||||||
|
- **Shortest Processing Time (SPT)**: Process shorter sources first — reduces average wait time
|
||||||
|
- **Priority + FIFO**: Within priority classes, process in arrival order
|
||||||
|
- **Weighted Shortest Job First (WSJF)**: Priority weight / estimated processing time — maximizes value delivery rate
|
||||||
|
|
||||||
|
### What we should NOT do
|
||||||
|
|
||||||
|
Invest in metaheuristic scheduling algorithms (genetic algorithms, simulated annealing, tabu search). These are powerful for large-scale JSSP instances (100+ jobs, 20+ machines) but complete overkill for our scale. The gap between optimal and simple-dispatching is tiny at our size.
|
||||||
|
|
||||||
|
### What to implement
|
||||||
|
|
||||||
|
**Phase 1 (now)**: Implement source complexity estimation. Short sources (tweets, brief articles) should be processed before long ones (whitepapers, multi-thread analyses). This is SPT — proven optimal for minimizing average flow time.
|
||||||
|
|
||||||
|
**Phase 2 (later)**: If we add domain-specific workers (e.g., Rio only processes internet-finance sources), the problem becomes a flexible flow-shop. Even then, simple "assign to least-loaded eligible worker" rules perform well.
|
||||||
|
|
||||||
|
→ SOURCE: ScienceDirect 2023, "The Flexible Job Shop Scheduling Problem: A Review"
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Adaptive / Elastic Scaling
|
||||||
|
|
||||||
|
### How it maps to our pipeline
|
||||||
|
|
||||||
|
Cloud-native autoscaling patterns solve exactly our problem: scaling workers up/down based on observed demand, without full cloud infrastructure. The key patterns:
|
||||||
|
|
||||||
|
**Queue-depth-based scaling (KEDA pattern)**:
|
||||||
|
```
|
||||||
|
desired_workers = ceil(queue_depth / target_items_per_worker)
|
||||||
|
```
|
||||||
|
|
||||||
|
Where `target_items_per_worker` is calibrated to keep workers busy but not overloaded. KEDA adds scale-to-zero: if queue_depth = 0, workers = 0.
|
||||||
|
|
||||||
|
**Multi-metric scaling**: Evaluate multiple signals simultaneously, scale to whichever requires the most workers:
|
||||||
|
```
|
||||||
|
workers = max(
|
||||||
|
ceil(unprocessed_sources / sources_per_worker),
|
||||||
|
ceil(open_prs / prs_per_eval_worker),
|
||||||
|
MIN_WORKERS
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Cooldown periods**: After scaling up, don't immediately scale down — wait for a cooldown period. Prevents oscillation when load is choppy. Kubernetes HPA uses 5-minute stabilization windows.
|
||||||
|
|
||||||
|
### Adapting for our cron-based system
|
||||||
|
|
||||||
|
We don't have Kubernetes, but we can implement the same logic in bash:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# In extract-cron.sh, replace fixed MAX_WORKERS:
|
||||||
|
QUEUE_DEPTH=$(grep -rl "^status: unprocessed" inbox/archive/ | wc -l)
|
||||||
|
EVAL_BACKLOG=$(curl -sf "$FORGEJO_URL/api/v1/.../pulls?state=open" | jq 'length')
|
||||||
|
|
||||||
|
# Scale extraction workers based on queue depth
|
||||||
|
DESIRED_EXTRACT=$(( (QUEUE_DEPTH + 2) / 3 )) # ~3 sources per worker
|
||||||
|
|
||||||
|
# Apply backpressure from eval: if eval is backlogged, slow extraction
|
||||||
|
if [ "$EVAL_BACKLOG" -gt 10 ]; then
|
||||||
|
DESIRED_EXTRACT=$(( DESIRED_EXTRACT / 2 ))
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Bound between min and max
|
||||||
|
WORKERS=$(( DESIRED_EXTRACT < 1 ? 1 : DESIRED_EXTRACT ))
|
||||||
|
WORKERS=$(( WORKERS > HARD_MAX ? HARD_MAX : WORKERS ))
|
||||||
|
```
|
||||||
|
|
||||||
|
### Counterintuitive finding: scale-to-zero saves more than scale-to-peak
|
||||||
|
|
||||||
|
In our cost model (expensive per worker-minute, zero cost for idle), the biggest savings come not from optimizing peak performance but from **not running workers when there's nothing to do**. Our current system already checks for unprocessed sources before dispatching — good. But it still runs the dispatcher every 5 minutes even when the queue has been empty for hours. A longer polling interval during quiet periods would save dispatcher overhead.
|
||||||
|
|
||||||
|
### What to implement
|
||||||
|
|
||||||
|
**Phase 1 (now)**: Replace fixed MAX_WORKERS with queue-depth-based formula. Add eval backpressure check to extract dispatcher.
|
||||||
|
|
||||||
|
**Phase 2 (soon)**: Add cooldown/hysteresis — different thresholds for scaling up vs. down.
|
||||||
|
|
||||||
|
**Phase 3 (later)**: Adaptive polling interval — faster polling when queue is active, slower when quiet.
|
||||||
|
|
||||||
|
→ SOURCE: OneUptime 2026, "How to Implement HPA with Object Metrics for Queue-Based Scaling"
|
||||||
|
→ SOURCE: KEDA documentation, keda.sh
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Backpressure & Flow Control
|
||||||
|
|
||||||
|
### How it maps to our pipeline
|
||||||
|
|
||||||
|
This is the most critical gap in our current architecture. **We have zero backpressure.** The three stages are decoupled with no feedback:
|
||||||
|
|
||||||
|
```
|
||||||
|
Research → [queue] → Extract → [queue] → Eval → [merge]
|
||||||
|
```
|
||||||
|
|
||||||
|
If research dumps 20 sources, extraction will happily create 20 PRs, and eval will struggle with a PR backlog. There's no signal from eval to extract saying "slow down, I'm drowning." This is the classic producer-consumer problem.
|
||||||
|
|
||||||
|
### The TCP analogy
|
||||||
|
|
||||||
|
TCP congestion control solves exactly this: a producer (sender) must match rate to consumer (receiver) capacity, with the network as an intermediary that can drop packets (data loss) if overloaded. The solution: **feedback-driven rate adjustment**.
|
||||||
|
|
||||||
|
In our pipeline:
|
||||||
|
- **Producer**: Extract (creates PRs)
|
||||||
|
- **Consumer**: Eval (reviews PRs)
|
||||||
|
- **Congestion signal**: Open PR count growing
|
||||||
|
- **Data loss equivalent**: Eval quality degrading under load (rushed reviews)
|
||||||
|
|
||||||
|
### Four backpressure strategies
|
||||||
|
|
||||||
|
1. **Buffer + threshold**: Allow some PR accumulation (buffer), but when open PRs exceed threshold, extract slows down. Simple, robust, our best first step.
|
||||||
|
|
||||||
|
2. **Rate matching**: Extract dispatches at most as many sources as eval processed in the previous cycle. Keeps the pipeline balanced but can under-utilize extract during catch-up periods.
|
||||||
|
|
||||||
|
3. **AIMD (Additive Increase Multiplicative Decrease)**: When eval queue is shrinking, increase extraction rate by 1 worker. When eval queue is growing, halve extraction workers. Proven stable, converges to optimal throughput. **This is the TCP approach and it's elegant for our setting.**
|
||||||
|
|
||||||
|
4. **Pull-based**: Eval "pulls" work from a staging area instead of extract "pushing" PRs. Requires architectural change but guarantees eval is never overloaded. Kafka uses this pattern (consumers pull at their own pace).
|
||||||
|
|
||||||
|
### The AIMD insight is gold
|
||||||
|
|
||||||
|
AIMD is provably optimal for fair allocation of shared resources without centralized control (Corless et al. 2016). It's mathematically guaranteed to converge regardless of the number of agents or parameter values. For our pipeline:
|
||||||
|
|
||||||
|
```
|
||||||
|
Each cycle:
|
||||||
|
if eval_queue_depth < eval_queue_depth_last_cycle:
|
||||||
|
# Queue shrinking — additive increase
|
||||||
|
extract_workers = min(extract_workers + 1, HARD_MAX)
|
||||||
|
else:
|
||||||
|
# Queue growing or stable — multiplicative decrease
|
||||||
|
extract_workers = max(extract_workers / 2, 1)
|
||||||
|
```
|
||||||
|
|
||||||
|
This requires zero modeling, zero parameter estimation, zero prediction. It just reacts to observed system state and is proven to converge to the optimal throughput that eval can sustain.
|
||||||
|
|
||||||
|
### What to implement
|
||||||
|
|
||||||
|
**Phase 1 (now, highest priority)**: Add backpressure check to extract-cron.sh. Before dispatching extraction workers, check open PR count. If open PRs > 15, reduce extraction parallelism by half. If open PRs > 25, skip this extraction cycle entirely.
|
||||||
|
|
||||||
|
**Phase 2 (soon)**: Implement AIMD scaling for extraction workers based on eval queue trend.
|
||||||
|
|
||||||
|
**Phase 3 (later)**: Consider pull-based architecture where eval signals readiness for more work.
|
||||||
|
|
||||||
|
→ SOURCE: Vlahakis et al. 2021, "AIMD Scheduling and Resource Allocation in Distributed Computing Systems"
|
||||||
|
→ SOURCE: Corless et al. 2016, "AIMD Dynamics and Distributed Resource Allocation" (SIAM)
|
||||||
|
→ SOURCE: Dagster, "What Is Backpressure"
|
||||||
|
→ SOURCE: Java Code Geeks 2025, "Reactive Programming Paradigms: Mastering Backpressure and Stream Processing"
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Markov Decision Processes
|
||||||
|
|
||||||
|
### How it maps to our pipeline
|
||||||
|
|
||||||
|
MDP formulates our scaling decision as a sequential optimization problem:
|
||||||
|
|
||||||
|
**State space**: S = (unprocessed_queue, in_flight_extractions, open_prs, active_extract_workers, active_eval_workers, time_of_day)
|
||||||
|
|
||||||
|
**Action space**: A = {add_extract_worker, remove_extract_worker, add_eval_worker, remove_eval_worker, wait}
|
||||||
|
|
||||||
|
**Transition model**: Queue depths change based on arrival rates (time-dependent) and service completions (stochastic).
|
||||||
|
|
||||||
|
**Cost function**: C(s, a) = worker_cost × active_workers + delay_cost × queue_depth
|
||||||
|
|
||||||
|
**Objective**: Find policy π: S → A that minimizes expected total discounted cost.
|
||||||
|
|
||||||
|
### Key findings
|
||||||
|
|
||||||
|
1. **Optimal policies have threshold structure** (Li et al. 2019 survey): The optimal MDP policy is almost always "if queue > X and workers < Y, spawn a worker." This means even without solving the full MDP, a well-tuned threshold policy is near-optimal.
|
||||||
|
|
||||||
|
2. **Hysteresis is optimal** (Tournaire et al. 2021): The optimal policy has different thresholds for scaling up vs. scaling down. Scale up at queue=10, scale down at queue=3 (not the same threshold). This prevents oscillation — exactly what AIMD achieves heuristically.
|
||||||
|
|
||||||
|
3. **Our state space is tractable**: With ~10 discrete queue levels × 6 worker levels × 5 eval worker levels × 4 time-of-day buckets = ~1,200 states. This is tiny for MDP — value iteration converges in seconds. We could solve for the exact optimal policy.
|
||||||
|
|
||||||
|
4. **MDP outperforms heuristics but not by much**: Tournaire et al. found that structured MDP algorithms outperform simple threshold heuristics, but the gap is modest (5-15% cost reduction). For our scale, a good threshold policy captures most of the value.
|
||||||
|
|
||||||
|
### The honest assessment
|
||||||
|
|
||||||
|
Solving the full MDP is theoretically clean but practically unnecessary at our scale. The MDP's main value is confirming that threshold policies with hysteresis are near-optimal — which validates implementing AIMD + backpressure thresholds as Phase 1 and not worrying about exact optimization until the system is much larger.
|
||||||
|
|
||||||
|
### What to implement
|
||||||
|
|
||||||
|
**Phase 1**: Don't solve the MDP. Implement threshold policies with hysteresis (different up/down thresholds) informed by MDP theory.
|
||||||
|
|
||||||
|
**Phase 2 (only if system grows significantly)**: Formulate and solve the MDP using value iteration. Use historical arrival/service data to parameterize the transition model. The optimal policy becomes a lookup table: given current state, take this action.
|
||||||
|
|
||||||
|
→ SOURCE: Tournaire et al. 2021, "Optimal Control Policies for Resource Allocation in the Cloud: MDP vs Heuristic Approaches"
|
||||||
|
→ SOURCE: Li et al. 2019, "An Overview for Markov Decision Processes in Queues and Networks"
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Synthesis: The Implementation Roadmap
|
||||||
|
|
||||||
|
### The core diagnosis
|
||||||
|
|
||||||
|
Our pipeline's architecture has three problems, in order of severity:
|
||||||
|
|
||||||
|
1. **No backpressure** — extraction can overwhelm evaluation with no feedback signal
|
||||||
|
2. **Fixed worker counts** — static MAX_WORKERS ignores queue state entirely
|
||||||
|
3. **No arrival modeling** — we treat all loads the same regardless of burst patterns
|
||||||
|
|
||||||
|
### Phase 1: Backpressure + Dynamic Scaling (implement now)
|
||||||
|
|
||||||
|
This captures 80% of the improvement with minimal complexity:
|
||||||
|
|
||||||
|
1. **Add eval backpressure to extract-cron.sh**: Check open PR count before dispatching. If backlogged, reduce extraction parallelism.
|
||||||
|
2. **Replace fixed MAX_WORKERS with queue-depth formula**: `workers = min(ceil(queue_depth / 3) + 1, HARD_MAX)`
|
||||||
|
3. **Add hysteresis**: Scale up when queue > 8, scale down when queue < 3. Different thresholds prevent oscillation.
|
||||||
|
4. **Instrument everything**: Log queue depths, worker counts, cycle times, utilization rates.
|
||||||
|
|
||||||
|
### Phase 2: AIMD Scaling (implement within 2 weeks)
|
||||||
|
|
||||||
|
Replace fixed formulas with adaptive AIMD:
|
||||||
|
|
||||||
|
1. Track eval queue trend (growing vs. shrinking) across cycles
|
||||||
|
2. Growing queue → multiplicative decrease of extraction rate
|
||||||
|
3. Shrinking queue → additive increase of extraction rate
|
||||||
|
4. This self-tunes without requiring parameter estimation
|
||||||
|
|
||||||
|
### Phase 3: Arrival Modeling + Optimization (implement within 1 month)
|
||||||
|
|
||||||
|
With 2+ weeks of instrumented data:
|
||||||
|
|
||||||
|
1. Calculate peakedness of arrival process
|
||||||
|
2. Apply peakedness-adjusted square-root staffing for worker provisioning
|
||||||
|
3. If warranted, formulate and solve the MDP for exact optimal policy
|
||||||
|
4. Implement adaptive polling intervals (faster when active, slower when quiet)
|
||||||
|
|
||||||
|
### Surprising findings
|
||||||
|
|
||||||
|
1. **Simple dispatching rules are near-optimal at our scale.** The combinatorial optimization literature says: for a hybrid flow-shop with <10 machines per stage, SPT/FIFO within priority classes is within 5-10% of optimal. Don't build a scheduler; build a good priority queue.
|
||||||
|
|
||||||
|
2. **AIMD is the single most valuable algorithm to implement.** It's proven stable, requires no modeling, and handles the backpressure + scaling problems simultaneously. TCP solved this exact problem 40 years ago.
|
||||||
|
|
||||||
|
3. **The MDP confirms we don't need the MDP.** The optimal policy is threshold-based with hysteresis — exactly what AIMD + backpressure thresholds give us. The MDP's value is validation, not computation.
|
||||||
|
|
||||||
|
4. **The square-root staffing rule means diminishing returns on workers.** Adding a 7th worker to a 6-worker system helps less than adding the 2nd worker to a 1-worker system. At our scale, the marginal worker is still valuable, but there's a real ceiling around 8-10 extraction workers and 6-8 eval workers beyond which additional workers waste money.
|
||||||
|
|
||||||
|
5. **Our biggest waste isn't too few workers — it's running workers against an empty queue.** The extract-cron runs every 5 minutes regardless of queue state. If the queue has been empty for 6 hours, that's 72 unnecessary dispatcher invocations. Adaptive polling (or event-driven triggering) would eliminate this overhead.
|
||||||
|
|
||||||
|
6. **The pipeline's binding constraint is eval, not extract.** Extract produces work faster than eval consumes it (6 extract workers × ~8 sources/cycle vs. 5 eval workers × ~5 PRs/cycle). Without backpressure, this imbalance causes PR accumulation. The right fix is rate-matching extraction to evaluation throughput, not speeding up extraction.
|
||||||
|
|
||||||
|
→ CLAIM CANDIDATE: "Backpressure is the highest-leverage architectural improvement for multi-stage pipelines because it prevents the most common failure mode (producer overwhelming consumer) with minimal implementation complexity"
|
||||||
|
|
||||||
|
→ CLAIM CANDIDATE: "AIMD provides near-optimal resource allocation for variable-load pipelines without requiring arrival modeling or parameter estimation because its convergence properties are independent of system parameters"
|
||||||
|
|
||||||
|
→ CLAIM CANDIDATE: "Simple priority dispatching rules perform within 5-10% of optimal for hybrid flow-shop scheduling at moderate scale because the combinatorial explosion that makes JSSP NP-hard only matters at large scale"
|
||||||
|
|
||||||
|
→ FLAG @leo: The mechanism design parallel is striking — backpressure in pipelines is structurally identical to price signals in markets. Both are feedback mechanisms that prevent producers from oversupplying when consumers can't absorb. AIMD in particular mirrors futarchy's self-correcting property: the system converges to optimal throughput through local feedback, not central planning.
|
||||||
|
|
||||||
|
→ FLAG @theseus: MDP formulation of pipeline scaling connects to AI agent resource allocation. If agents are managing their own compute budgets, AIMD provides a decentralized mechanism for fair sharing without requiring a central coordinator.
|
||||||
|
|
@ -0,0 +1,30 @@
|
||||||
|
---
|
||||||
|
type: source
|
||||||
|
title: "Staffing a Service System with Non-Poisson Non-Stationary Arrivals"
|
||||||
|
author: "Ward Whitt et al. (Cambridge Core)"
|
||||||
|
url: https://www.cambridge.org/core/journals/probability-in-the-engineering-and-informational-sciences/article/abs/staffing-a-service-system-with-nonpoisson-nonstationary-arrivals/0F42FDA80A8B0B197D3D9E0B040A43D2
|
||||||
|
date: 2016-01-01
|
||||||
|
domain: internet-finance
|
||||||
|
format: paper
|
||||||
|
status: unprocessed
|
||||||
|
tags: [pipeline-architecture, operations-research, stochastic-modeling, non-stationary-arrivals, capacity-sizing]
|
||||||
|
---
|
||||||
|
|
||||||
|
# Staffing a Service System with Non-Poisson Non-Stationary Arrivals
|
||||||
|
|
||||||
|
Extends the square-root staffing formula to handle non-Poisson arrival processes, including non-stationary Cox processes where the arrival rate itself is a stochastic process.
|
||||||
|
|
||||||
|
## Key Content
|
||||||
|
|
||||||
|
- Standard Poisson assumption fails when arrivals are bursty or time-varying
|
||||||
|
- Introduces "peakedness" — the variance-to-mean ratio of the arrival process — as the key parameter for non-Poisson adjustment
|
||||||
|
- Modified staffing formula: adjust the square-root safety margin by the peakedness factor
|
||||||
|
- For bursty arrivals (peakedness > 1), you need MORE safety capacity than Poisson models suggest
|
||||||
|
- For smooth arrivals (peakedness < 1), you need LESS
|
||||||
|
- Practical: replacing time-varying arrival rates with constant (average or max) leads to badly under- or over-staffed systems
|
||||||
|
|
||||||
|
## Relevance to Teleo Pipeline
|
||||||
|
|
||||||
|
Our arrival process is highly non-stationary: research dumps are bursty (15 sources at once), futardio launches come in bursts of 20+, while some days are quiet. This is textbook non-Poisson non-stationary. The peakedness parameter captures exactly how bursty our arrivals are and tells us how much extra capacity we need beyond the basic square-root staffing rule.
|
||||||
|
|
||||||
|
Key insight: using a constant MAX_WORKERS regardless of current queue state is the worst of both worlds — too many workers during quiet periods (wasted compute), too few during bursts (queue explosion).
|
||||||
|
|
@ -0,0 +1,28 @@
|
||||||
|
---
|
||||||
|
type: source
|
||||||
|
title: "AIMD Dynamics and Distributed Resource Allocation"
|
||||||
|
author: "Martin J. Corless, C. King, R. Shorten, F. Wirth (SIAM)"
|
||||||
|
url: https://epubs.siam.org/doi/book/10.1137/1.9781611974225
|
||||||
|
date: 2016-01-01
|
||||||
|
domain: internet-finance
|
||||||
|
format: paper
|
||||||
|
status: unprocessed
|
||||||
|
tags: [pipeline-architecture, operations-research, AIMD, distributed-resource-allocation, congestion-control, fairness]
|
||||||
|
---
|
||||||
|
|
||||||
|
# AIMD Dynamics and Distributed Resource Allocation
|
||||||
|
|
||||||
|
SIAM monograph on AIMD (Additive Increase Multiplicative Decrease) as a general-purpose distributed resource allocation mechanism. Extends the TCP congestion control principle to resource allocation in computing, energy, and other domains.
|
||||||
|
|
||||||
|
## Key Content
|
||||||
|
|
||||||
|
- AIMD is the most widely used method for allocating limited resources among competing agents without centralized control
|
||||||
|
- Core algorithm: additive increase when no congestion (rate += α), multiplicative decrease when congestion detected (rate *= β, where 0 < β < 1)
|
||||||
|
- Provably fair: converges to equal sharing of available bandwidth/capacity
|
||||||
|
- Provably stable: system converges regardless of number of agents or parameter values
|
||||||
|
- Three sample applications: internet congestion control, smart grid energy allocation, distributed computing
|
||||||
|
- Key property: no global information needed — each agent only needs to observe local congestion signals
|
||||||
|
|
||||||
|
## Relevance to Teleo Pipeline
|
||||||
|
|
||||||
|
AIMD provides a principled, proven scaling algorithm: when eval queue is shrinking (no congestion), increase extraction workers by 1 per cycle. When eval queue is growing (congestion), halve extraction workers. This doesn't require predicting load, modeling arrivals, or solving optimization problems — it reacts to observed system state and is mathematically guaranteed to converge. Perfect for our "expensive compute, variable load" setting.
|
||||||
|
|
@ -0,0 +1,28 @@
|
||||||
|
---
|
||||||
|
type: source
|
||||||
|
title: "Economies-of-Scale in Many-Server Queueing Systems: Tutorial and Partial Review of the QED Halfin-Whitt Heavy-Traffic Regime"
|
||||||
|
author: "Johan van Leeuwaarden, Britt Mathijsen, Jaron Sanders (SIAM Review)"
|
||||||
|
url: https://epubs.siam.org/doi/10.1137/17M1133944
|
||||||
|
date: 2018-01-01
|
||||||
|
domain: internet-finance
|
||||||
|
format: paper
|
||||||
|
status: unprocessed
|
||||||
|
tags: [pipeline-architecture, operations-research, queueing-theory, Halfin-Whitt, economies-of-scale, square-root-staffing]
|
||||||
|
---
|
||||||
|
|
||||||
|
# Economies-of-Scale in Many-Server Queueing Systems
|
||||||
|
|
||||||
|
SIAM Review tutorial on the QED (Quality-and-Efficiency-Driven) Halfin-Whitt heavy-traffic regime — the mathematical foundation for understanding when and how multi-server systems achieve economies of scale.
|
||||||
|
|
||||||
|
## Key Content
|
||||||
|
|
||||||
|
- The QED regime: operate near full utilization while keeping delays manageable
|
||||||
|
- As server count n grows, utilization approaches 1 at rate Θ(1/√n) — the "square root staffing" principle
|
||||||
|
- Economies of scale: larger systems need proportionally fewer excess servers for the same service quality
|
||||||
|
- The regime applies to systems ranging from tens to thousands of servers
|
||||||
|
- Square-root safety staffing works empirically even for moderate-sized systems (5-20 servers)
|
||||||
|
- Tutorial connects abstract queueing theory to practical staffing decisions
|
||||||
|
|
||||||
|
## Relevance to Teleo Pipeline
|
||||||
|
|
||||||
|
At our scale (5-6 workers), we're in the "moderate system" range where square-root staffing still provides useful guidance. The key takeaway: we don't need sophisticated algorithms for a system this small. Simple threshold policies informed by queueing theory will capture most of the benefit. The economies-of-scale result also tells us that if we grow to 20+ workers, the marginal value of each additional worker decreases — important for cost optimization.
|
||||||
|
|
@ -0,0 +1,27 @@
|
||||||
|
---
|
||||||
|
type: source
|
||||||
|
title: "Resource Scheduling in Non-Stationary Service Systems"
|
||||||
|
author: "Simio / WinterSim 2018"
|
||||||
|
url: https://www.simio.com/resources/papers/WinterSim2018/Resource-Scheduling-In-Non-stationary-Service-Systems.php
|
||||||
|
date: 2018-12-01
|
||||||
|
domain: internet-finance
|
||||||
|
format: paper
|
||||||
|
status: unprocessed
|
||||||
|
tags: [pipeline-architecture, stochastic-modeling, non-stationary-arrivals, resource-scheduling, simulation]
|
||||||
|
---
|
||||||
|
|
||||||
|
# Resource Scheduling in Non-Stationary Service Systems
|
||||||
|
|
||||||
|
WinterSim 2018 paper on scheduling resources (servers/workers) when arrival rates change over time. Addresses the gap between theoretical queueing models (which assume stationarity) and real systems (which don't).
|
||||||
|
|
||||||
|
## Key Content
|
||||||
|
|
||||||
|
- Non-stationary service systems require time-varying staffing — fixed worker counts are suboptimal
|
||||||
|
- The goal: determine the number of servers as a function of time
|
||||||
|
- Without server constraints there would be no waiting time, but this wastes capacity since arrivals are stochastic and nonstationary
|
||||||
|
- Simulation-based approach: use discrete-event simulation to test staffing policies against realistic arrival patterns
|
||||||
|
- Key tradeoff: responsiveness (adding workers fast when load spikes) vs. efficiency (not wasting workers during quiet periods)
|
||||||
|
|
||||||
|
## Relevance to Teleo Pipeline
|
||||||
|
|
||||||
|
Directly applicable: our pipeline needs time-varying worker counts, not fixed MAX_WORKERS. The paper validates the approach of measuring queue depth and adjusting workers dynamically rather than using static cron-based fixed pools.
|
||||||
|
|
@ -0,0 +1,28 @@
|
||||||
|
---
|
||||||
|
type: source
|
||||||
|
title: "Modeling and Simulation of Nonstationary Non-Poisson Arrival Processes"
|
||||||
|
author: "Yunan Liu et al. (NC State)"
|
||||||
|
url: https://yunanliu.wordpress.ncsu.edu/files/2019/11/CIATApublished.pdf
|
||||||
|
date: 2019-01-01
|
||||||
|
domain: internet-finance
|
||||||
|
format: paper
|
||||||
|
status: unprocessed
|
||||||
|
tags: [pipeline-architecture, stochastic-modeling, non-stationary-arrivals, MMPP, batch-arrivals]
|
||||||
|
---
|
||||||
|
|
||||||
|
# Modeling and Simulation of Nonstationary Non-Poisson Arrival Processes
|
||||||
|
|
||||||
|
Introduces the CIATA (Combined Inversion-and-Thinning Approach) method for modeling nonstationary non-Poisson processes characterized by a rate function, mean-value function, and asymptotic variance-to-mean (dispersion) ratio.
|
||||||
|
|
||||||
|
## Key Content
|
||||||
|
|
||||||
|
- Standard Poisson process assumptions break down when arrivals are bursty or correlated
|
||||||
|
- CIATA models target arrival processes via rate function + dispersion ratio — captures both time-varying intensity and burstiness
|
||||||
|
- The Markov-MECO process (a Markovian arrival process / MAP) models interarrival times as absorption times of a continuous-time Markov chain
|
||||||
|
- Markov-Modulated Poisson Process (MMPP): arrival rate switches between states governed by a hidden Markov chain — natural model for "bursty then quiet" patterns
|
||||||
|
- Key finding: replacing a time-varying arrival rate with a constant (max or average) leads to systems being badly understaffed or overstaffed
|
||||||
|
- Congestion measures are increasing functions of arrival process variability — more bursty = more capacity needed
|
||||||
|
|
||||||
|
## Relevance to Teleo Pipeline
|
||||||
|
|
||||||
|
Our arrival process is textbook MMPP: there's a hidden state (research session happening vs. quiet period) that governs the arrival rate. During research sessions, sources arrive in bursts of 10-20. During quiet periods, maybe 0-2 per day. The MMPP framework models this directly and gives us tools to size capacity for the mixture of states rather than the average.
|
||||||
|
|
@ -0,0 +1,29 @@
|
||||||
|
---
|
||||||
|
type: source
|
||||||
|
title: "What You Should Know About Queueing Models"
|
||||||
|
author: "Ward Whitt (Columbia University)"
|
||||||
|
url: https://www.columbia.edu/~ww2040/shorter041907.pdf
|
||||||
|
date: 2019-04-19
|
||||||
|
domain: internet-finance
|
||||||
|
format: paper
|
||||||
|
status: unprocessed
|
||||||
|
tags: [pipeline-architecture, operations-research, queueing-theory, square-root-staffing, Halfin-Whitt]
|
||||||
|
---
|
||||||
|
|
||||||
|
# What You Should Know About Queueing Models
|
||||||
|
|
||||||
|
Practitioner-oriented guide by Ward Whitt (Columbia), one of the founders of modern queueing theory for service systems. Covers the essential queueing models practitioners need and introduces the Halfin-Whitt heavy-traffic regime.
|
||||||
|
|
||||||
|
## Key Content
|
||||||
|
|
||||||
|
- Square-root staffing principle: optimal server count = base load + β√(base load), where β is a quality-of-service parameter
|
||||||
|
- The Halfin-Whitt (QED) regime: systems operate near full utilization while keeping delays manageable — utilization approaches 1 at rate Θ(1/√n) as servers n grow
|
||||||
|
- Economies of scale in multi-server systems: larger systems need proportionally fewer excess servers
|
||||||
|
- Practical formulas for determining server counts given arrival rates and service level targets
|
||||||
|
- Erlang C formula as the workhorse for staffing calculations
|
||||||
|
|
||||||
|
## Relevance to Teleo Pipeline
|
||||||
|
|
||||||
|
The square-root staffing rule is directly applicable: if our base load requires R workers at full utilization, we should provision R + β√R workers where β ≈ 1-2 depending on target service level. For our scale (~8 sources/cycle, ~5 min service time), this gives concrete worker count guidance.
|
||||||
|
|
||||||
|
Critical insight: you don't need to match peak load with workers. The square-root safety margin handles variance efficiently. Over-provisioning for peak is wasteful; under-provisioning for average causes queue explosion. The sweet spot is the QED regime.
|
||||||
29
inbox/archive/2019-07-00-li-overview-mdp-queues-networks.md
Normal file
29
inbox/archive/2019-07-00-li-overview-mdp-queues-networks.md
Normal file
|
|
@ -0,0 +1,29 @@
|
||||||
|
---
|
||||||
|
type: source
|
||||||
|
title: "An Overview for Markov Decision Processes in Queues and Networks"
|
||||||
|
author: "Quan-Lin Li, Jing-Yu Ma, Rui-Na Fan, Li Xia"
|
||||||
|
url: https://arxiv.org/abs/1907.10243
|
||||||
|
date: 2019-07-24
|
||||||
|
domain: internet-finance
|
||||||
|
format: paper
|
||||||
|
status: unprocessed
|
||||||
|
tags: [pipeline-architecture, operations-research, markov-decision-process, queueing-theory, dynamic-programming]
|
||||||
|
---
|
||||||
|
|
||||||
|
# An Overview for Markov Decision Processes in Queues and Networks
|
||||||
|
|
||||||
|
Comprehensive 42-page survey of MDP applications in queueing systems, covering 60+ years of research from the 1960s to present.
|
||||||
|
|
||||||
|
## Key Content
|
||||||
|
|
||||||
|
- Continuous-time MDPs for queue management: decisions happen at state transitions (arrivals, departures)
|
||||||
|
- Classic results: optimal policies often have threshold structure — "serve if queue > K, idle if queue < K"
|
||||||
|
- For multi-server systems: optimal admission and routing policies are often simple (join-shortest-queue, threshold-based)
|
||||||
|
- Dynamic programming and stochastic optimization provide tools for deriving optimal policies
|
||||||
|
- Key challenge: curse of dimensionality — state space explodes with multiple queues/stages
|
||||||
|
- Practical approaches: approximate dynamic programming, reinforcement learning for large state spaces
|
||||||
|
- Emerging direction: deep RL for queue management in networks and cloud computing
|
||||||
|
|
||||||
|
## Relevance to Teleo Pipeline
|
||||||
|
|
||||||
|
Our pipeline has a manageable state space (queue depths across 3 stages, worker counts, time-of-day) — small enough for exact MDP solution via value iteration. The survey confirms that optimal policies for our type of system typically have threshold structure: "if queue > X and workers < Y, spawn a worker." This means even without solving the full MDP, a well-tuned threshold policy will be near-optimal.
|
||||||
|
|
@ -0,0 +1,30 @@
|
||||||
|
---
|
||||||
|
type: source
|
||||||
|
title: "Optimal Control Policies for Resource Allocation in the Cloud: Comparison Between Markov Decision Process and Heuristic Approaches"
|
||||||
|
author: "Thomas Tournaire, Hind Castel-Taleb, Emmanuel Hyon"
|
||||||
|
url: https://arxiv.org/abs/2104.14879
|
||||||
|
date: 2021-04-30
|
||||||
|
domain: internet-finance
|
||||||
|
format: paper
|
||||||
|
status: unprocessed
|
||||||
|
tags: [pipeline-architecture, operations-research, markov-decision-process, cloud-autoscaling, optimal-control]
|
||||||
|
---
|
||||||
|
|
||||||
|
# Optimal Control Policies for Resource Allocation in the Cloud
|
||||||
|
|
||||||
|
Compares MDP-based optimal scaling policies against heuristic approaches for cloud auto-scaling. The MDP formulation treats VM provisioning as a sequential decision problem.
|
||||||
|
|
||||||
|
## Key Content
|
||||||
|
|
||||||
|
- Auto-scaling problem: VMs turned on/off based on queue occupation to minimize combined energy + performance cost
|
||||||
|
- MDP formulation: states = queue lengths + active VMs, actions = add/remove VMs, rewards = negative cost (energy + SLA violations)
|
||||||
|
- Value iteration and policy iteration algorithms find optimal threshold policies
|
||||||
|
- Structured MDP algorithms incorporating hysteresis properties outperform heuristics in both execution time and accuracy
|
||||||
|
- Hysteresis: different thresholds for scaling up vs. scaling down — prevents oscillation (e.g., scale up at queue=10, scale down at queue=3)
|
||||||
|
- MDP algorithms find optimal hysteresis thresholds automatically
|
||||||
|
|
||||||
|
## Relevance to Teleo Pipeline
|
||||||
|
|
||||||
|
The MDP formulation maps directly: states = (unprocessed queue, in-flight extractions, open PRs, active workers), actions = (spawn worker, kill worker, wait), cost = (Claude compute cost per worker-minute + delay cost per queued source). The hysteresis insight is particularly valuable — we should have different thresholds for spinning up vs. spinning down workers to prevent oscillation.
|
||||||
|
|
||||||
|
Key finding: structured MDP with hysteresis outperforms simple threshold heuristics. But even simple threshold policies (scale up at queue=N, scale down at queue=M where M < N) perform reasonably well.
|
||||||
|
|
@ -0,0 +1,30 @@
|
||||||
|
---
|
||||||
|
type: source
|
||||||
|
title: "AIMD Scheduling and Resource Allocation in Distributed Computing Systems"
|
||||||
|
author: "Vlahakis, Athanasopoulos et al."
|
||||||
|
url: https://arxiv.org/abs/2109.02589
|
||||||
|
date: 2021-09-06
|
||||||
|
domain: internet-finance
|
||||||
|
format: paper
|
||||||
|
status: unprocessed
|
||||||
|
tags: [pipeline-architecture, operations-research, AIMD, distributed-computing, resource-allocation, congestion-control]
|
||||||
|
---
|
||||||
|
|
||||||
|
# AIMD Scheduling and Resource Allocation in Distributed Computing Systems
|
||||||
|
|
||||||
|
Applies TCP's AIMD (Additive Increase Multiplicative Decrease) congestion control to distributed computing resource allocation — scheduling incoming requests across computing nodes.
|
||||||
|
|
||||||
|
## Key Content
|
||||||
|
|
||||||
|
- Models distributed system as multi-queue scheme with computing nodes
|
||||||
|
- Proposes AIMD-like admission control: stable irrespective of total node count and AIMD parameters
|
||||||
|
- Key insight: congestion control in networks and worker scaling in compute pipelines are the same problem — matching producer rate to consumer capacity
|
||||||
|
- Decentralized resource allocation using nonlinear state feedback achieves global convergence to bounded set in finite time
|
||||||
|
- Connects to QoS via Little's Law: local queuing time calculable from simple formula
|
||||||
|
- AIMD is proven optimal for fair allocation of shared resources among competing agents without centralized control
|
||||||
|
|
||||||
|
## Relevance to Teleo Pipeline
|
||||||
|
|
||||||
|
AIMD provides an elegant scaling policy: when queue is shrinking (system healthy), add workers linearly (e.g., +1 per cycle). When queue is growing (system overloaded), cut workers multiplicatively (e.g., halve them). This is self-correcting, proven stable, and doesn't require predicting load — it reacts to observed queue state.
|
||||||
|
|
||||||
|
The TCP analogy is precise: our pipeline "bandwidth" is eval throughput. When extract produces faster than eval can consume, we need backpressure (slow extraction) or scale-up (more eval workers). AIMD handles this naturally.
|
||||||
|
|
@ -0,0 +1,29 @@
|
||||||
|
---
|
||||||
|
type: source
|
||||||
|
title: "Using Little's Law to Scale Applications"
|
||||||
|
author: "Dan Slimmon"
|
||||||
|
url: https://blog.danslimmon.com/2022/06/07/using-littles-law-to-scale-applications/
|
||||||
|
date: 2022-06-07
|
||||||
|
domain: internet-finance
|
||||||
|
format: essay
|
||||||
|
status: unprocessed
|
||||||
|
tags: [pipeline-architecture, operations-research, queueing-theory, littles-law, capacity-planning]
|
||||||
|
---
|
||||||
|
|
||||||
|
# Using Little's Law to Scale Applications
|
||||||
|
|
||||||
|
Practitioner guide showing how Little's Law (L = λW) provides a simple but powerful tool for capacity planning in real systems.
|
||||||
|
|
||||||
|
## Key Content
|
||||||
|
|
||||||
|
- Little's Law: L = λW where L = average items in system, λ = arrival rate, W = average time per item
|
||||||
|
- Rearranged for capacity: (total worker threads) ≥ (arrival rate)(average processing time)
|
||||||
|
- Practical example: 1000 req/s × 0.34s = 340 concurrent requests needed
|
||||||
|
- Important caveat: Little's Law gives long-term averages only — real systems need buffer capacity beyond the theoretical minimum to handle variance
|
||||||
|
- The formula guides capacity planning but isn't a complete scaling solution — it's the floor, not the ceiling
|
||||||
|
|
||||||
|
## Relevance to Teleo Pipeline
|
||||||
|
|
||||||
|
Direct application: if we process ~8 sources per extraction cycle (every 5 min) and each takes ~10-15 min of Claude compute, Little's Law says L = (8/300s) × 750s ≈ 20 sources in-flight at steady state. With 6 workers, each handles ~3.3 sources concurrently — which means we need the workers to pipeline or we'll have queue buildup.
|
||||||
|
|
||||||
|
More practically: λ = average sources per second, W = average extraction time. Total workers needed ≥ λ × W. This gives us the minimum worker floor. The square-root staffing rule gives us the safety margin above that floor.
|
||||||
|
|
@ -0,0 +1,29 @@
|
||||||
|
---
|
||||||
|
type: source
|
||||||
|
title: "The Flexible Job Shop Scheduling Problem: A Review"
|
||||||
|
author: "ScienceDirect review article"
|
||||||
|
url: https://www.sciencedirect.com/science/article/pii/S037722172300382X
|
||||||
|
date: 2023-01-01
|
||||||
|
domain: internet-finance
|
||||||
|
format: paper
|
||||||
|
status: unprocessed
|
||||||
|
tags: [pipeline-architecture, operations-research, combinatorial-optimization, job-shop-scheduling, flexible-scheduling]
|
||||||
|
---
|
||||||
|
|
||||||
|
# The Flexible Job Shop Scheduling Problem: A Review
|
||||||
|
|
||||||
|
Comprehensive review of the Flexible Job Shop Scheduling Problem (FJSP) — a generalization of classical JSSP where operations can be processed on any machine from a set of eligible machines.
|
||||||
|
|
||||||
|
## Key Content
|
||||||
|
|
||||||
|
- Classical Job Shop Scheduling Problem (JSSP): n jobs, m machines, fixed operation-to-machine mapping, NP-complete for m > 2
|
||||||
|
- Flexible JSSP (FJSP): operations can run on any eligible machine — adds machine assignment as a decision variable
|
||||||
|
- Flow-shop: all jobs follow the same machine order (our pipeline: research → extract → eval)
|
||||||
|
- Job-shop: jobs can have different machine orders (not our case)
|
||||||
|
- Hybrid flow-shop: multiple machines at each stage, jobs follow same stage order but can use any machine within a stage (THIS is our model)
|
||||||
|
- Solution approaches: metaheuristics (genetic algorithms, simulated annealing, tabu search) dominate for NP-hard instances
|
||||||
|
- Recent trend: multi-agent reinforcement learning for dynamic scheduling with worker heterogeneity and uncertainty
|
||||||
|
|
||||||
|
## Relevance to Teleo Pipeline
|
||||||
|
|
||||||
|
Our pipeline is a **hybrid flow-shop**: three stages (research → extract → eval), multiple workers at each stage, all sources flow through the same stage sequence. This is computationally easier than general JSSP. Key insight: for a hybrid flow-shop with relatively few stages and homogeneous workers within each stage, simple priority dispatching rules (shortest-job-first, FIFO within priority classes) perform within 5-10% of optimal. We don't need metaheuristics — we need good dispatching rules.
|
||||||
29
inbox/archive/2024-00-00-dagster-data-backpressure.md
Normal file
29
inbox/archive/2024-00-00-dagster-data-backpressure.md
Normal file
|
|
@ -0,0 +1,29 @@
|
||||||
|
---
|
||||||
|
type: source
|
||||||
|
title: "What Is Backpressure"
|
||||||
|
author: "Dagster"
|
||||||
|
url: https://dagster.io/glossary/data-backpressure
|
||||||
|
date: 2024-01-01
|
||||||
|
domain: internet-finance
|
||||||
|
format: essay
|
||||||
|
status: unprocessed
|
||||||
|
tags: [pipeline-architecture, backpressure, data-pipelines, flow-control]
|
||||||
|
---
|
||||||
|
|
||||||
|
# What Is Backpressure (Dagster)
|
||||||
|
|
||||||
|
Dagster's practical guide to backpressure in data pipelines. Written for practitioners building real data processing systems.
|
||||||
|
|
||||||
|
## Key Content
|
||||||
|
|
||||||
|
- Backpressure: feedback mechanism preventing data producers from overwhelming consumers
|
||||||
|
- Without backpressure controls: data loss, crashes, resource exhaustion
|
||||||
|
- Consumer signals producer about capacity limits
|
||||||
|
- Implementation strategies: buffering (with threshold triggers), rate limiting, dynamic adjustment, acknowledgment-based flow
|
||||||
|
- Systems using backpressure: Apache Kafka (pull-based consumption), Flink, Spark Streaming, Akka Streams, Project Reactor
|
||||||
|
- Tradeoff: backpressure introduces latency but prevents catastrophic failure
|
||||||
|
- Key principle: design backpressure into the system from the start
|
||||||
|
|
||||||
|
## Relevance to Teleo Pipeline
|
||||||
|
|
||||||
|
Our pipeline has zero backpressure today. The extract-cron.sh checks for unprocessed sources and dispatches workers regardless of eval queue state. If extraction outruns evaluation, PRs accumulate with no feedback signal. Simple fix: extraction dispatcher should check open PR count before dispatching. If open PRs > threshold, reduce extraction parallelism or skip the cycle.
|
||||||
|
|
@ -0,0 +1,29 @@
|
||||||
|
---
|
||||||
|
type: source
|
||||||
|
title: "On Queueing Theory for Large-Scale CI/CD Pipelines Optimization"
|
||||||
|
author: "Grégory Bournassenko"
|
||||||
|
url: https://arxiv.org/abs/2504.18705
|
||||||
|
date: 2025-04-25
|
||||||
|
domain: internet-finance
|
||||||
|
format: paper
|
||||||
|
status: unprocessed
|
||||||
|
tags: [pipeline-architecture, operations-research, queueing-theory, ci-cd, M/M/c-queue]
|
||||||
|
---
|
||||||
|
|
||||||
|
# On Queueing Theory for Large-Scale CI/CD Pipelines Optimization
|
||||||
|
|
||||||
|
Academic paper applying classical M/M/c queueing theory to model CI/CD pipeline systems. Proposes a queueing theory modeling framework to optimize large-scale build/test workflows using multi-server queue models.
|
||||||
|
|
||||||
|
## Key Content
|
||||||
|
|
||||||
|
- Addresses bottleneck formation in high-volume shared infrastructure pipelines
|
||||||
|
- Models pipeline stages as M/M/c queues (Poisson arrivals, exponential service, c servers)
|
||||||
|
- Integrates theoretical queueing analysis with practical optimization — dynamic scaling and prioritization of CI/CD tasks
|
||||||
|
- Framework connects arrival rate modeling to worker count optimization
|
||||||
|
- Demonstrates that classical queueing models provide actionable guidance for real software pipelines
|
||||||
|
|
||||||
|
## Relevance to Teleo Pipeline
|
||||||
|
|
||||||
|
Direct parallel: our extract/eval pipeline IS a multi-stage CI/CD-like system. Sources arrive (Poisson-ish), workers process them (variable service times), and queue depth determines throughput. The M/M/c framework gives us closed-form solutions for expected wait times given worker counts.
|
||||||
|
|
||||||
|
Key insight: M/M/c queues show that adding workers has diminishing returns — the marginal improvement of worker N+1 decreases as N grows. This means there's an optimal worker count beyond which additional workers waste compute without meaningfully reducing queue wait times.
|
||||||
|
|
@ -0,0 +1,31 @@
|
||||||
|
---
|
||||||
|
type: source
|
||||||
|
title: "Reactive Programming Paradigms: Mastering Backpressure and Stream Processing"
|
||||||
|
author: "Java Code Geeks"
|
||||||
|
url: https://www.javacodegeeks.com/2025/12/reactive-programming-paradigms-mastering-backpressure-and-stream-processing.html
|
||||||
|
date: 2025-12-01
|
||||||
|
domain: internet-finance
|
||||||
|
format: essay
|
||||||
|
status: unprocessed
|
||||||
|
tags: [pipeline-architecture, backpressure, reactive-streams, flow-control, producer-consumer]
|
||||||
|
---
|
||||||
|
|
||||||
|
# Reactive Programming Paradigms: Mastering Backpressure and Stream Processing
|
||||||
|
|
||||||
|
Practitioner guide to implementing backpressure in reactive stream processing systems. Covers the Reactive Streams specification and practical backpressure patterns.
|
||||||
|
|
||||||
|
## Key Content
|
||||||
|
|
||||||
|
- Reactive Streams standard: Publisher/Subscriber/Subscription interfaces with demand-based flow control
|
||||||
|
- Subscriber requests N items → Publisher delivers at most N → prevents overwhelming
|
||||||
|
- Four backpressure strategies:
|
||||||
|
1. **Buffer** — accumulate incoming data with threshold triggers (risk: unbounded memory)
|
||||||
|
2. **Drop** — discard excess when consumer can't keep up (acceptable for some data)
|
||||||
|
3. **Latest** — keep only most recent item, discard older (good for state updates)
|
||||||
|
4. **Error** — signal failure when buffer overflows (forces architectural fix)
|
||||||
|
- Practical implementations: Project Reactor (Spring WebFlux), Akka Streams, RxJava
|
||||||
|
- Key insight: backpressure must be designed into the system from the start — bolting it on later is much harder
|
||||||
|
|
||||||
|
## Relevance to Teleo Pipeline
|
||||||
|
|
||||||
|
Our pipeline currently has NO backpressure. Extract produces PRs that accumulate in eval's queue without any feedback mechanism. If research dumps 20 sources, extraction creates 20 PRs, and eval drowns trying to process them all. We need a "buffer + rate limit" strategy: extraction should check eval queue depth before starting new work, and slow down or pause when eval is backlogged.
|
||||||
|
|
@ -0,0 +1,33 @@
|
||||||
|
---
|
||||||
|
type: source
|
||||||
|
title: "How to Implement HPA with Object Metrics for Queue-Based Scaling"
|
||||||
|
author: "OneUptime"
|
||||||
|
url: https://oneuptime.com/blog/post/2026-02-09-hpa-object-metrics-queue/view
|
||||||
|
date: 2026-02-09
|
||||||
|
domain: internet-finance
|
||||||
|
format: essay
|
||||||
|
status: unprocessed
|
||||||
|
tags: [pipeline-architecture, kubernetes, autoscaling, queue-based-scaling, KEDA, HPA]
|
||||||
|
---
|
||||||
|
|
||||||
|
# How to Implement HPA with Object Metrics for Queue-Based Scaling
|
||||||
|
|
||||||
|
Practical guide to implementing Kubernetes HPA scaling based on queue depth rather than CPU/memory metrics. Covers object metrics, custom metrics, and integration patterns.
|
||||||
|
|
||||||
|
## Key Content
|
||||||
|
|
||||||
|
- Queue depth is a better scaling signal than CPU for worker-style workloads
|
||||||
|
- Object metrics in HPA allow scaling based on custom Kubernetes objects (ConfigMaps, custom resources)
|
||||||
|
- Pattern: monitor pending messages in queue → scale workers to process them
|
||||||
|
- Multi-metric HPA: evaluate several metrics simultaneously, scale to whichever requires most replicas
|
||||||
|
- KEDA (Kubernetes Event Driven Autoscaler): scale-to-zero capability, 70+ built-in scalers
|
||||||
|
- KEDA pattern: 0 → 1 via event trigger, 1 → N via HPA metrics feed
|
||||||
|
- Key insight: scale proactively based on how much work is waiting, not reactively based on how busy workers are
|
||||||
|
|
||||||
|
## Relevance to Teleo Pipeline
|
||||||
|
|
||||||
|
We don't run Kubernetes, but the patterns are directly transferable to our cron-based system:
|
||||||
|
1. Replace fixed MAX_WORKERS with queue-depth-based scaling: workers = f(queue_depth)
|
||||||
|
2. Implement scale-to-zero: if no unprocessed sources, don't spawn workers at all (we already do this)
|
||||||
|
3. Multi-metric scaling: consider both extract queue depth AND eval queue depth when deciding extraction worker count
|
||||||
|
4. The proactive scaling insight is key: our dispatcher should look at queue depth, not just worker availability
|
||||||
Loading…
Reference in a new issue