perf(commit-walk): per-process pack cache + pack-first ordering (~120× faster) #24

Summary

Two complementary fixes for slow commit-graph walks (Graph.Fallback.ahead_behind/4, commits_between/4, ancestor?/4).

1. Per-process pack cache

Every ObjectResolver.read/2 was calling Repo.storage_call(:get_pack) — File.read/1 of the entire pack file on the Filesystem backend, all serialized through Erlang’s singleton :prim_file GenServer. A 3000-commit walk = 3000 full pack reads.

Memoize four things in the process dictionary:

pack data (get_pack)
parsed pack index (get_pack_index + Index.parse)
index-derived SHA→offset cache (build_sha_cache)
pack listing (list_packs)

Process-dict scope is correct because pack files are content-addressed (filename has SHA, contents immutable) and one LiveView mount = one process = one cache lifetime. Public clear_pack_cache/0 for tests / explicit invalidation.

2. Pack-first ordering

After (1), profile showed loose-object existence checks dominating: Object.read was checking for a loose object before falling through to packs. Every commit-walk SHA paid 1× :prim_file round-trip to confirm “no loose file” on a repo where commits are all packed. 90k synchronous round-trips per page mount.

Inverts ObjectResolver.read/2 to check packs first, fall through to loose objects on miss. After the cache is warm, pack lookup is Index.lookup/2 (binary search in memory) — no syscall, no GenServer. Loose check now only fires for SHAs not in any pack (recently-written, not-yet-packed case).

Safe by content addressing: a SHA in both loose AND a pack must have byte-identical content. Git itself uses pack-first ordering for the same reason.

Empirical impact

Profiled Git.ahead_behind for chiron’s 30 open PRs against bare clone of fangorn/chiron (~3000-commit main history, all packed) inside a memory-capped OrbStack container.

	Before	After pack cache only	After + pack-first
Total wall (30 PRs)	~10 min (extrapolated)	11.8 s	4.9 s
Per-PR mean	19500 ms	395 ms	164 ms
First call (cold cache)	28776 ms	957 ms	486 ms
Subsequent calls	15-19 s	400-500 ms	150-170 ms
Flame samples for `:gen.do_call/4`	~22000	~2200	~38

Cumulative ~120× faster end-to-end. Page that was unusable (10 min) now loads in 5 s.

What’s left at 164ms/call

Pure zlib decompression of commit objects. Reader.inflate_prefix_size/4 and :zlib.* calls now dominate the flame graph. That’s the actual algorithmic work of unpacking each commit — unavoidable without batching/parallelism.

Further improvements possible but lower-leverage:

Reuse zlib state across object reads (avoid :zlib.inflateInit/inflateEnd per-object)
Consumer-side parallelism (Task.async_stream over PRs in Anvil’s compute_ahead_behind/2) — would cut to ~500ms wall via concurrency
Architectural fix using :file.pread/3 with :raw mode to bypass :prim_file for uncached cold reads

None of those are in this PR.

API surface

clear_pack_cache/0 — public, drops cached entries in the calling process

Verified

mix test — 903/903 pass (51 excluded, same as main)
mix format --check-formatted clean
mix credo --strict clean on changed file