A repo that cleans up after itself — without deleting the wrong branch
Subtitle: Deleting branches in bulk is a one-keystroke disaster waiting to happen. The fix is to make deletion the last step of a careful classification — and to cap, defer, and audit everything.
The Self-Driving Repo · Part 8 — Housekeeping

Old branches are the junk drawer of a repo. Every merged feature, every abandoned experiment, every "let me just try something" leaves one behind. They pile up silently — clones get slower, branch pickers get unusable, and git branch -r scrolls for a screen and a half. On a long-lived repo with a busy team, you wake up one day with hundreds of dead branches.
The naive fix — a script that deletes anything old — is how you delete a release branch someone needed, or a long-running integration branch that just hadn't been touched in a while. With deletion, "oops" is expensive. You don't get to be cavalier.
So I built branch cleanup as a classification problem first and a deletion problem second. The deleting is three lines. The safety is everything before it.
The problem
Unmanaged branches cause slow, compounding pain:
1. Clutter that slows people down. Hundreds of stale refs make branch lists and tooling sluggish and confusing. 2. No signal about what's safe to remove. Is this branch merged? Abandoned? An active long-runner that's just quiet this week? You can't tell at a glance, so nobody cleans up at all. 3. Cleanup is scary, so it never happens. The risk of nuking something important means the junk drawer just grows forever.
The idea
A scheduled workflow — Sunday at 4am, when nobody's pushing — that fetches every branch and the full PR history, then sorts each branch into a category before a single deletion happens:
- Sacred —
master,develop, the default branch. Never touched. - Protected — anything with branch protection on. Never touched.
- Open-PR — the head or base of an open PR. Active; leave it.
- Merged — already merged via a PR. Safe to delete.
- Stale — last commit older than a threshold (default ~2 years). Safe to delete.
- Active — recent commits, no reason to remove. Keep.
Only the Merged and Stale buckets get deleted, and even then, carefully.
on:
schedule:
- cron: '0 4 * * 0' # Sunday 04:00 UTC
workflow_dispatch:
inputs:
age_days: { default: '730' } # ~2 years
max_deletions: { default: '50' } # hard cap per run
How it works

Build the world before you judge it
You can't classify a branch in isolation — you need to know whether it backs an open PR, whether it was merged, and how old its tip is. So the first step assembles the full picture: all open PRs (to collect their head/base refs), all merged PRs (to know what's been merged and by whom), and every branch with its last-commit timestamp and protection flag.
# Every branch with sha + protected flag, as TSV
gh api repos/$REPO/branches --paginate \
| jq -r '.[] | [.name, .commit.sha, (.protected|tostring)] | @tsv'
Classify with a strict precedence
Each branch runs through the categories in priority order — and safety always wins ties. Sacred beats protected beats open-PR beats merged beats stale. A branch that is both "merged" and "backs an open PR" is kept, because the open-PR check comes first. The ordering is the safety policy:
if in_list "$NAME" "$SACRED"; then echo "SKIP:sacred"
elif [ "$PROTECTED" = "true" ]; then echo "SKIP:protected"
elif in_list "$NAME" "$OPEN_REFS"; then echo "SKIP:open-pr"
elif is_merged "$NAME"; then echo "DELETE:merged"
elif older_than "$CUTOFF" "$NAME"; then echo "DELETE:stale"
else echo "KEEP:active"
fi
Stale detection is just epoch math — compare the tip commit's timestamp to a cutoff:
CUTOFF=$(( $(date -u +%s) - AGE_DAYS * 86400 ))
Cap and defer — never delete unboundedly
Here's the rule that turns a bulk operation from terrifying into routine: never delete more than N in one run. If 400 branches qualify, delete the first 50, and defer the rest to next week. A capped, repeated cleanup converges safely; an uncapped one is a single command that can wreck a repo. The cap is a circuit breaker against a bad classification — if something's wrong, you've harmed 50 branches, not 800, and you'll see it before the next run.
The deferred branches are logged, not lost — the run reports exactly what it skipped for capacity so there are no surprises.
Delete carefully, link back, and audit
Even the deletion step is defensive. Branch names contain slashes and the occasional oddity, so they're URL-encoded before the API call. And when a merged branch is deleted, the workflow comments on its original PR — closing the loop so there's a trail from "this branch is gone" back to "here's why":
ENC=$(jq -rn --arg x "$NAME" '$x|@uri') # handle slashes/special chars
gh api -X DELETE "repos/$REPO/git/refs/heads/$ENC"
# then, for merged branches, comment on the source PR: "branch auto-deleted by cleanup"
Every action lands in two places: a digest posted to team chat (counts, grouped by reason, with author attribution) and a full job summary with tables of what was deleted, what was deferred, and what was kept. Bulk deletion without an audit trail is how you lose the ability to answer "what happened to my branch?" — so the trail is non-negotiable.
What it bought us
- The junk drawer stays empty without anyone doing scary manual cleanup.
- Zero accidents, because deletion is the last step of a conservative classification, not a blunt age filter.
- A clear record. Every deletion is attributable and explained, with merged branches linked back to their PRs.
- It self-throttles. A big backlog clears over several capped runs instead of one heart-stopping purge — around 100 dead branches reclaimed.
Gotchas & trade-offs
- "Merged" is subtler than it looks. Squash-merges don't leave the branch as an ancestor of
master, so ancestry checks lie — rely on the PR merge record, notgit branch --merged. - Protection flags are only as good as your config. The workflow honors branch protection, but if an important long-runner isn't protected and goes quiet past the threshold, it qualifies as stale. Protect what matters, or widen "sacred."
- API pagination has real limits. Listing thousands of PRs/branches can truncate; the job logs when a list may be incomplete rather than pretending it saw everything. Trust-but-verify your counts on large repos.
- The cap is a feature, resist removing it. "Just clean it all up at once" is the request that precedes the incident. Defer-and-repeat is slower and far safer.
- Schedule it for quiet hours. Deleting branches while people push invites races. Off-peak isn't just politeness; it reduces edge cases.
Takeaway
Any automation that deletes should be a classifier with a delete step bolted on the end — never a delete step with a filter. Order your categories so safety wins every tie, cap the blast radius per run, defer the overflow, and audit every action with a trail back to why. Make the destructive thing boring and reversible-in-spirit, and bulk cleanup stops being a thing you're afraid to run.
Next, the finale: aggregating git and Jira into per-person weekly summaries with an LLM — and the hard conversation about doing that responsibly.
The complete workflow
Here is the full, genericized workflow — drop it into .github/workflows/ and replace the placeholders (your-org, the PROJ project key, <@DISCORD_USER_ID>, the example team, and the secret names) with your own.
.github/workflows/cleanup-stale-branches.yml
name: Cleanup Stale Branches
on:
schedule:
- cron: '0 4 * * 0' # Every Sunday at 04:00 UTC
workflow_dispatch:
inputs:
age_days:
description: 'Delete branches with no commits in N days'
required: false
default: '730'
max_deletions:
description: 'Hard cap on branches deleted per run'
required: false
default: '50'
concurrency:
group: cleanup-stale-branches
cancel-in-progress: false
permissions:
contents: write
pull-requests: write
jobs:
cleanup:
runs-on: ubuntu-slim
timeout-minutes: 30
steps:
- name: Classify branches
id: classify
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
REPO: ${{ github.repository }}
DEFAULT_BRANCH: ${{ github.event.repository.default_branch }}
AGE_DAYS: ${{ inputs.age_days || '730' }}
MAX_DELETE: ${{ inputs.max_deletions || '50' }}
run: |
set -euo pipefail
CUTOFF_EPOCH=$(( $(date -u +%s) - AGE_DAYS * 86400 ))
echo "Cutoff: $(date -u -d "@$CUTOFF_EPOCH" +%Y-%m-%d) (age_days=$AGE_DAYS)"
SKIP_ALWAYS=$(printf '%s\n' master develop "$DEFAULT_BRANCH" | sort -u)
OPEN_PR_LIMIT=1000
gh pr list --repo "$REPO" --state open --limit "$OPEN_PR_LIMIT" \
--json headRefName,baseRefName > /tmp/open_prs.json
OPEN_PR_COUNT=$(jq 'length' /tmp/open_prs.json)
if [ "$OPEN_PR_COUNT" -ge "$OPEN_PR_LIMIT" ]; then
echo "::error::Open PR count ($OPEN_PR_COUNT) reached limit ($OPEN_PR_LIMIT); list may be truncated and untracked open-PR refs could be deleted. Raise OPEN_PR_LIMIT and re-run."
exit 1
fi
jq -r '.[].headRefName, .[].baseRefName' /tmp/open_prs.json | sort -u > /tmp/open_refs.txt
echo "Open PRs: $OPEN_PR_COUNT | unique refs: $(wc -l < /tmp/open_refs.txt | tr -d ' ')"
MERGED_PR_LIMIT=5000
gh pr list --repo "$REPO" --state merged --limit "$MERGED_PR_LIMIT" \
--json headRefName,number,mergedAt,author \
--jq '.[] | "\(.headRefName)\t\(.number)\t\(.mergedAt)\t\(.author.login // "unknown")"' \
| sort -u > /tmp/merged_prs.tsv
MERGED_PR_COUNT=$(wc -l < /tmp/merged_prs.tsv | tr -d ' ')
if [ "$MERGED_PR_COUNT" -ge "$MERGED_PR_LIMIT" ]; then
echo "::warning::Merged PR list at limit ($MERGED_PR_LIMIT); some merged-via-pr deletions may be missed. Raise MERGED_PR_LIMIT if needed."
fi
echo "Merged PR head refs: $MERGED_PR_COUNT"
gh api "repos/$REPO/branches" --paginate \
--jq '.[] | [.name, .commit.sha, (.protected|tostring)] | @tsv' > /tmp/branches.tsv
echo "Remote branches: $(wc -l < /tmp/branches.tsv | tr -d ' ')"
: > /tmp/delete.tsv
: > /tmp/keep.tsv
while IFS=