extract: 2024-00-00-dagster-data-backpressure
Some checks are pending
Sync Graph Data to teleo-app / sync (push) Waiting to run

Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>
This commit is contained in:
Teleo Pipeline 2026-03-15 16:01:51 +00:00 committed by Leo
parent f45744b576
commit 74a5a7ae64
2 changed files with 53 additions and 1 deletions

View file

@ -0,0 +1,41 @@
---
type: claim
domain: internet-finance
description: "Flow control mechanism that signals producers to slow down when consumers reach capacity limits"
confidence: proven
source: "Dagster, What Is Backpressure glossary entry, 2024"
created: 2026-03-11
---
# Backpressure prevents pipeline failure by creating feedback loop between consumer capacity and producer rate
Backpressure is a flow control mechanism where data consumers signal producers about their capacity limits, preventing system overload. Without backpressure controls, pipelines experience data loss, crashes, and resource exhaustion when producers overwhelm consumers.
The mechanism operates through several implementation strategies:
- **Buffering with threshold triggers** — queues that signal when capacity approaches limits
- **Rate limiting** — explicit caps on production speed
- **Dynamic adjustment** — real-time scaling based on consumer state
- **Acknowledgment-based flow** — producers wait for consumer confirmation before sending more data
Major distributed systems implement backpressure as core architecture: Apache Kafka uses pull-based consumption where consumers control their own rate, while Flink, Spark Streaming, Akka Streams, and Project Reactor all build backpressure into their execution models.
The tradeoff is explicit: backpressure introduces latency (producers must wait for consumer signals) but prevents catastrophic failure modes. This makes backpressure a design-time decision, not a retrofit — systems must incorporate feedback channels from the start.
## Evidence
- Dagster documentation identifies backpressure as standard pattern across Apache Kafka, Flink, Spark Streaming, Akka Streams, Project Reactor
- Implementation strategies documented: buffering, rate limiting, dynamic adjustment, acknowledgment-based flow
- Failure modes without backpressure: data loss, crashes, resource exhaustion
## Relevance to Teleo
The Teleo pipeline currently has zero backpressure. The extract-cron.sh dispatcher checks for unprocessed sources and launches workers without checking eval queue state. If extraction outruns evaluation, PRs accumulate with no feedback signal to slow extraction.
Simple implementation: extraction dispatcher should check open PR count before dispatching. If open PRs exceed threshold, reduce extraction parallelism or skip the cycle entirely. This creates the feedback loop that prevents eval queue overload.
---
Relevant Notes:
- domains/internet-finance/_map
Topics:
- core/mechanisms/_map

View file

@ -6,8 +6,13 @@ url: https://dagster.io/glossary/data-backpressure
date: 2024-01-01
domain: internet-finance
format: essay
status: unprocessed
status: processed
tags: [pipeline-architecture, backpressure, data-pipelines, flow-control]
processed_by: rio
processed_date: 2026-03-11
claims_extracted: ["backpressure-prevents-pipeline-failure-by-creating-feedback-loop-between-consumer-capacity-and-producer-rate.md"]
extraction_model: "anthropic/claude-sonnet-4.5"
extraction_notes: "Single claim extracted on backpressure as flow control mechanism. Source is practical implementation guide rather than research, so confidence is 'proven' based on widespread production adoption. Teleo pipeline relevance noted in claim body as concrete application context."
---
# What Is Backpressure (Dagster)
@ -27,3 +32,9 @@ Dagster's practical guide to backpressure in data pipelines. Written for practit
## Relevance to Teleo Pipeline
Our pipeline has zero backpressure today. The extract-cron.sh checks for unprocessed sources and dispatches workers regardless of eval queue state. If extraction outruns evaluation, PRs accumulate with no feedback signal. Simple fix: extraction dispatcher should check open PR count before dispatching. If open PRs > threshold, reduce extraction parallelism or skip the cycle.
## Key Facts
- Backpressure implementations: buffering with thresholds, rate limiting, dynamic adjustment, acknowledgment-based flow
- Systems using backpressure: Apache Kafka (pull-based), Flink, Spark Streaming, Akka Streams, Project Reactor
- Failure modes without backpressure: data loss, crashes, resource exhaustion