extract: 2024-00-00-dagster-data-backpressure

Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>
2026-03-15 16:01:51 +00:00 · 2026-03-15 16:01:51 +00:00 · 74a5a7ae64
commit 74a5a7ae64
parent f45744b576
2 changed files with 53 additions and 1 deletions
--- a/domains/internet-finance/backpressure-prevents-pipeline-failure-by-creating-feedback-loop-between-consumer-capacity-and-producer-rate.md
+++ b/domains/internet-finance/backpressure-prevents-pipeline-failure-by-creating-feedback-loop-between-consumer-capacity-and-producer-rate.md
@ -0,0 +1,41 @@
+---
+type: claim
+domain: internet-finance
+description: "Flow control mechanism that signals producers to slow down when consumers reach capacity limits"
+confidence: proven
+source: "Dagster, What Is Backpressure glossary entry, 2024"
+created: 2026-03-11
+---
+
+# Backpressure prevents pipeline failure by creating feedback loop between consumer capacity and producer rate
+
+Backpressure is a flow control mechanism where data consumers signal producers about their capacity limits, preventing system overload. Without backpressure controls, pipelines experience data loss, crashes, and resource exhaustion when producers overwhelm consumers.
+
+The mechanism operates through several implementation strategies:
+- **Buffering with threshold triggers** — queues that signal when capacity approaches limits
+- **Rate limiting** — explicit caps on production speed
+- **Dynamic adjustment** — real-time scaling based on consumer state
+- **Acknowledgment-based flow** — producers wait for consumer confirmation before sending more data
+
+Major distributed systems implement backpressure as core architecture: Apache Kafka uses pull-based consumption where consumers control their own rate, while Flink, Spark Streaming, Akka Streams, and Project Reactor all build backpressure into their execution models.
+
+The tradeoff is explicit: backpressure introduces latency (producers must wait for consumer signals) but prevents catastrophic failure modes. This makes backpressure a design-time decision, not a retrofit — systems must incorporate feedback channels from the start.
+
+## Evidence
+- Dagster documentation identifies backpressure as standard pattern across Apache Kafka, Flink, Spark Streaming, Akka Streams, Project Reactor
+- Implementation strategies documented: buffering, rate limiting, dynamic adjustment, acknowledgment-based flow
+- Failure modes without backpressure: data loss, crashes, resource exhaustion
+
+## Relevance to Teleo
+
+The Teleo pipeline currently has zero backpressure. The extract-cron.sh dispatcher checks for unprocessed sources and launches workers without checking eval queue state. If extraction outruns evaluation, PRs accumulate with no feedback signal to slow extraction.
+
+Simple implementation: extraction dispatcher should check open PR count before dispatching. If open PRs exceed threshold, reduce extraction parallelism or skip the cycle entirely. This creates the feedback loop that prevents eval queue overload.
+
+---
+
+Relevant Notes:
+- domains/internet-finance/_map
+
+Topics:
+- core/mechanisms/_map
--- a/inbox/archive/2024-00-00-dagster-data-backpressure.md
+++ b/inbox/archive/2024-00-00-dagster-data-backpressure.md
@ -6,8 +6,13 @@ url: https://dagster.io/glossary/data-backpressure
 date: 2024-01-01
 domain: internet-finance
 format: essay
-status: unprocessed
+status: processed
 tags: [pipeline-architecture, backpressure, data-pipelines, flow-control]
+processed_by: rio
+processed_date: 2026-03-11
+claims_extracted: ["backpressure-prevents-pipeline-failure-by-creating-feedback-loop-between-consumer-capacity-and-producer-rate.md"]
+extraction_model: "anthropic/claude-sonnet-4.5"
+extraction_notes: "Single claim extracted on backpressure as flow control mechanism. Source is practical implementation guide rather than research, so confidence is 'proven' based on widespread production adoption. Teleo pipeline relevance noted in claim body as concrete application context."
 ---

 # What Is Backpressure (Dagster)
@ -27,3 +32,9 @@ Dagster's practical guide to backpressure in data pipelines. Written for practit
 ## Relevance to Teleo Pipeline

 Our pipeline has zero backpressure today. The extract-cron.sh checks for unprocessed sources and dispatches workers regardless of eval queue state. If extraction outruns evaluation, PRs accumulate with no feedback signal. Simple fix: extraction dispatcher should check open PR count before dispatching. If open PRs > threshold, reduce extraction parallelism or skip the cycle.
+
+
+## Key Facts
+- Backpressure implementations: buffering with thresholds, rate limiting, dynamic adjustment, acknowledgment-based flow
+- Systems using backpressure: Apache Kafka (pull-based), Flink, Spark Streaming, Akka Streams, Project Reactor
+- Failure modes without backpressure: data loss, crashes, resource exhaustion