Pipeline reliability (8 fixes, reviewed by Ganymede+Rhea+Leo+Rio):
1. Merge API recovery — pre-flight approval check, transient/permanent distinction, jitter
2. Ghost PR detection — ls-remote branch check in reconciliation, network guard
3. Source status contract — directory IS status, no code change needed
4. Batch-state markers eliminated — two-gate skip (archive-check + batched branch-check)
5. Branch SHA tracking — batched ls-remote, auto-reset verdicts, dismiss stale reviews
6. Mirror pre-flight permissions — chown check in sync-mirror.sh
7. Telegram archive commit-after-write — git add/commit/push with rebase --abort fallback
8. Post-merge source archiving — queue/ → archive/{domain}/ after merge
Pipeline fixes:
- merge_cycled flag — eval attempts preserved during merge-failure cycling (Ganymede+Rhea)
- merge_failures diagnostic counter
- Startup recovery preserves eval_attempts (was incorrectly resetting to 0)
- No-diff PRs auto-closed by eval (root cause of 17 zombie PRs)
- GC threshold aligned with substantive fixer budget (was 2, now 4)
- Conflict retry with 3-attempt budget + permanent conflict handler
- Local ff-merge fallback for Forgejo 405 errors
Telegram bot:
- KB retrieval: 3-layer (entity resolution → claim search → agent context)
- Reply-to-bot handler (context.bot.id check)
- Tag regex: @teleo|@futairdbot
- Prompt rewrite for natural analyst voice
- Market data API integration (Ben's token price endpoint)
- Conversation windows (5-message unanswered counter, per-user-per-chat)
- Conversation history in prompt (last 5 exchanges)
- Worktree file lock for archive writes
Infrastructure:
- worktree_lock.py — file-based lock (flock) for main worktree coordination
- backfill-sources.py — source DB registration for Argus funnel
- batch-extract-50.sh v3 — two-gate skip, batched ls-remote, network guard
- sync-mirror.sh — auto-PR creation for mirrored GitHub branches, permission pre-flight
- Argus dashboard — conflicts + reviewing in backlog, queue count in funnel
- Enrichment-inside-frontmatter bug fix (regex anchor, not --- split)
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
145 lines
4.6 KiB
Bash
Executable file
145 lines
4.6 KiB
Bash
Executable file
#!/bin/bash
|
|
# Batch extract sources from inbox/queue/ — v3 with two-gate skip logic
|
|
#
|
|
# Uses separate extract/ worktree (not main/ — prevents daemon race condition).
|
|
# Skip logic uses two checks instead of local marker files (Ganymede v3 review):
|
|
# Gate 1: Is source already in archive/{domain}/? → already processed, dedup
|
|
# Gate 2: Does extraction branch exist on Forgejo? → extraction in progress
|
|
# Neither → extract
|
|
#
|
|
# Architecture: Ganymede (two-gate) + Rhea (separate worktrees)
|
|
|
|
REPO=/opt/teleo-eval/workspaces/extract
|
|
MAIN_REPO=/opt/teleo-eval/workspaces/main
|
|
EXTRACT=/opt/teleo-eval/openrouter-extract-v2.py
|
|
CLEANUP=/opt/teleo-eval/post-extract-cleanup.py
|
|
LOG=/opt/teleo-eval/logs/batch-extract-50.log
|
|
TOKEN=$(cat /opt/teleo-eval/secrets/forgejo-leo-token)
|
|
FORGEJO_URL="http://localhost:3000"
|
|
MAX=50
|
|
COUNT=0
|
|
SUCCESS=0
|
|
FAILED=0
|
|
SKIPPED=0
|
|
|
|
# Lockfile to prevent concurrent runs
|
|
LOCKFILE="/tmp/batch-extract.lock"
|
|
if [ -f "$LOCKFILE" ]; then
|
|
pid=$(cat "$LOCKFILE" 2>/dev/null)
|
|
if kill -0 "$pid" 2>/dev/null; then
|
|
echo "[$(date)] SKIP: batch extract already running (pid $pid)" >> $LOG
|
|
exit 0
|
|
fi
|
|
rm -f "$LOCKFILE"
|
|
fi
|
|
echo $$ > "$LOCKFILE"
|
|
trap 'rm -f "$LOCKFILE"' EXIT
|
|
|
|
echo "[$(date)] Starting batch extraction of $MAX sources" >> $LOG
|
|
|
|
cd $REPO || exit 1
|
|
git fetch origin main 2>/dev/null
|
|
git checkout -f main 2>/dev/null
|
|
git reset --hard origin/main 2>/dev/null
|
|
|
|
# Get sources in queue
|
|
SOURCES=$(ls inbox/queue/*.md 2>/dev/null | head -$MAX)
|
|
|
|
# Batch fetch all remote branches once (Ganymede: 1 call instead of 84)
|
|
REMOTE_BRANCHES=$(git ls-remote --heads origin 2>/dev/null)
|
|
if [ $? -ne 0 ]; then
|
|
echo "[$(date)] ABORT: git ls-remote failed — remote unreachable, skipping cycle" >> $LOG
|
|
exit 0
|
|
fi
|
|
|
|
for SOURCE in $SOURCES; do
|
|
COUNT=$((COUNT + 1))
|
|
BASENAME=$(basename "$SOURCE" .md)
|
|
BRANCH="extract/$BASENAME"
|
|
|
|
# Gate 1: Already in archive? Source was already processed — dedup (Ganymede)
|
|
if find "$MAIN_REPO/inbox/archive" -name "$BASENAME.md" 2>/dev/null | grep -q .; then
|
|
echo "[$(date)] [$COUNT/$MAX] SKIP $BASENAME (already in archive)" >> $LOG
|
|
# Delete the queue duplicate
|
|
rm -f "$MAIN_REPO/inbox/queue/$BASENAME.md" 2>/dev/null
|
|
SKIPPED=$((SKIPPED + 1))
|
|
continue
|
|
fi
|
|
|
|
# Gate 2: Branch exists on Forgejo? Extraction already in progress (cached lookup)
|
|
if echo "$REMOTE_BRANCHES" | grep -q "refs/heads/$BRANCH$"; then
|
|
echo "[$(date)] [$COUNT/$MAX] SKIP $BASENAME (branch exists — in progress)" >> $LOG
|
|
SKIPPED=$((SKIPPED + 1))
|
|
continue
|
|
fi
|
|
|
|
echo "[$(date)] [$COUNT/$MAX] Processing $BASENAME" >> $LOG
|
|
|
|
# Reset to main
|
|
git checkout -f main 2>/dev/null
|
|
git fetch origin main 2>/dev/null
|
|
git reset --hard origin/main 2>/dev/null
|
|
|
|
# Clean stale remote branch (Leo's catch — prevents checkout conflicts)
|
|
git push origin --delete "$BRANCH" 2>/dev/null
|
|
|
|
# Create fresh branch
|
|
git branch -D "$BRANCH" 2>/dev/null
|
|
git checkout -b "$BRANCH" 2>/dev/null
|
|
if [ $? -ne 0 ]; then
|
|
echo " -> SKIP (branch creation failed)" >> $LOG
|
|
SKIPPED=$((SKIPPED + 1))
|
|
continue
|
|
fi
|
|
|
|
# Run extraction
|
|
python3 $EXTRACT "$SOURCE" --no-review >> $LOG 2>&1
|
|
EXTRACT_RC=$?
|
|
|
|
|
|
|
|
if [ $EXTRACT_RC -ne 0 ]; then
|
|
FAILED=$((FAILED + 1))
|
|
echo " -> FAILED (extract rc=$EXTRACT_RC)" >> $LOG
|
|
continue
|
|
fi
|
|
|
|
# Post-extraction cleanup
|
|
python3 $CLEANUP $REPO >> $LOG 2>&1
|
|
|
|
# Check if any files were created/modified
|
|
CHANGED=$(git status --porcelain | wc -l | tr -d " ")
|
|
if [ "$CHANGED" -eq 0 ]; then
|
|
echo " -> No changes (enrichment/null-result only)" >> $LOG
|
|
continue
|
|
fi
|
|
|
|
# Commit
|
|
git add -A
|
|
git commit -m "extract: $BASENAME
|
|
|
|
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>" >> $LOG 2>&1
|
|
|
|
# Push
|
|
git push "http://leo:${TOKEN}@localhost:3000/teleo/teleo-codex.git" "$BRANCH" --force >> $LOG 2>&1
|
|
|
|
# Create PR
|
|
curl -sf -X POST "http://localhost:3000/api/v1/repos/teleo/teleo-codex/pulls" \
|
|
-H "Authorization: token $TOKEN" \
|
|
-H "Content-Type: application/json" \
|
|
-d "{\"title\":\"extract: $BASENAME\",\"head\":\"$BRANCH\",\"base\":\"main\"}" >> /dev/null 2>&1
|
|
|
|
SUCCESS=$((SUCCESS + 1))
|
|
echo " -> SUCCESS ($CHANGED files)" >> $LOG
|
|
|
|
# Back to main
|
|
git checkout -f main 2>/dev/null
|
|
|
|
# Rate limit
|
|
sleep 2
|
|
done
|
|
|
|
echo "[$(date)] Batch complete: $SUCCESS success, $FAILED failed, $SKIPPED skipped (already attempted)" >> $LOG
|
|
|
|
git checkout -f main 2>/dev/null
|
|
git reset --hard origin/main 2>/dev/null
|