perf(graph): batched ahead_behind_many — walk base ancestors once for N heads #25

merged colechristensen cole.christensen@gmail.com wants to merge graph-ahead-behind-perf into main
No CI

Why

`ahead_behind/3` walks `ancestors(base) ∪ ancestors(head)` from scratch on every call. When a caller asks the same question for many heads against one base — typical PR-list page where every PR has `base = main` — that re-walks `ancestors(base)` N times.

For Anvil’s chiron PR-list (393-commit main, 70 PRs), this is a 70× redundancy: 27,510 node-visits where 393 should suffice for the base side.

What

New `ExGitObjectstore.ahead_behind_many(repo, base_sha, head_shas)` that walks `ancestors(base)` once into a set, then for each head:

  1. BFS from head, classifying each commit as in-base (merge point) or not (head-only → ahead).
  2. DOWN-BFS within base_ancestors from the merge points to size the intersection; `behind = |base_ancestors| - |intersection|`.

Cost goes from `O(N · |ancestors(base)|)` to `O(|ancestors(base)| + Σ head walks)`.

Public API uses the standard graph/fallback routing:

  • Graph available + base in graph → batched fast path. Heads not in graph (just-pushed branches) fall back per-head.
  • No graph → every head goes through per-head `ahead_behind/3` (which already has its cat_object walker fallback).

Numbers

Measured against chiron (393-commit main, 70 PRs, graph built):

Time
70 × per-head `ahead_behind` 33 ms
Batched `ahead_behind_many` 6 ms (5.5×)

Note: chiron’s graph wasn’t actually built in production, so today’s `ahead_behind` per-call cost is much worse (~50 ms via cat_object fallback). `mix anvil.graphs.rebuild –only fangorn/chiron` would already drop per-head from ~50 ms to ~0.4 ms; this batched API is on top of that.

Tests

  • 6 new tests in `graph/queries_test.exs` cross-checking against unbatched `ahead_behind/3` for the chiron-shape workload, diverged heads, equal-to-base, missing head, missing base, empty list.
  • 3 new tests in `graph_integration_test.exs` covering the wrapper’s three routing cases (full graph hit, partial fallback for just-pushed heads, no-graph total fallback).
  • Full suite: 912 tests, 0 failures.

Test plan

  • CI green
  • Anvil PR (companion) bumps the dep + wires `compute_ahead_behind/2` to use `ahead_behind_many`

🤖 Generated with Claude Code

Created Apr 30, 2026 at 04:45 UTC | Merged Apr 30, 2026 at 05:01 UTC by colechristensen cole.christensen@gmail.com