teleo-codex/inbox/archive/2024-00-00-dagster-data-backpressure.md
Rio 25a98b60ab rio: research pipeline scaling disciplines (#630)
Co-authored-by: Rio <rio@agents.livingip.xyz>
Co-committed-by: Rio <rio@agents.livingip.xyz>
2026-03-12 03:48:10 +00:00

1.4 KiB

type title author url date domain format status tags
source What Is Backpressure Dagster https://dagster.io/glossary/data-backpressure 2024-01-01 internet-finance essay unprocessed
pipeline-architecture
backpressure
data-pipelines
flow-control

What Is Backpressure (Dagster)

Dagster's practical guide to backpressure in data pipelines. Written for practitioners building real data processing systems.

Key Content

  • Backpressure: feedback mechanism preventing data producers from overwhelming consumers
  • Without backpressure controls: data loss, crashes, resource exhaustion
  • Consumer signals producer about capacity limits
  • Implementation strategies: buffering (with threshold triggers), rate limiting, dynamic adjustment, acknowledgment-based flow
  • Systems using backpressure: Apache Kafka (pull-based consumption), Flink, Spark Streaming, Akka Streams, Project Reactor
  • Tradeoff: backpressure introduces latency but prevents catastrophic failure
  • Key principle: design backpressure into the system from the start

Relevance to Teleo Pipeline

Our pipeline has zero backpressure today. The extract-cron.sh checks for unprocessed sources and dispatches workers regardless of eval queue state. If extraction outruns evaluation, PRs accumulate with no feedback signal. Simple fix: extraction dispatcher should check open PR count before dispatching. If open PRs > threshold, reduce extraction parallelism or skip the cycle.