perf(commit-walk): per-process pack cache + pack-first ordering (~120× faster) #24

merged colechristensen cole.christensen@gmail.com wants to merge commit-walk-pack-cache into main
No CI

Summary

Two complementary fixes for slow commit-graph walks (Graph.Fallback.ahead_behind/4, commits_between/4, ancestor?/4).

1. Per-process pack cache

Every ObjectResolver.read/2 was calling Repo.storage_call(:get_pack)File.read/1 of the entire pack file on the Filesystem backend, all serialized through Erlang’s singleton :prim_file GenServer. A 3000-commit walk = 3000 full pack reads.

Memoize four things in the process dictionary:

  • pack data (get_pack)
  • parsed pack index (get_pack_index + Index.parse)
  • index-derived SHA→offset cache (build_sha_cache)
  • pack listing (list_packs)

Process-dict scope is correct because pack files are content-addressed (filename has SHA, contents immutable) and one LiveView mount = one process = one cache lifetime. Public clear_pack_cache/0 for tests / explicit invalidation.

2. Pack-first ordering

After (1), profile showed loose-object existence checks dominating: Object.read was checking for a loose object before falling through to packs. Every commit-walk SHA paid 1× :prim_file round-trip to confirm “no loose file” on a repo where commits are all packed. 90k synchronous round-trips per page mount.

Inverts ObjectResolver.read/2 to check packs first, fall through to loose objects on miss. After the cache is warm, pack lookup is Index.lookup/2 (binary search in memory) — no syscall, no GenServer. Loose check now only fires for SHAs not in any pack (recently-written, not-yet-packed case).

Safe by content addressing: a SHA in both loose AND a pack must have byte-identical content. Git itself uses pack-first ordering for the same reason.

Empirical impact

Profiled Git.ahead_behind for chiron’s 30 open PRs against bare clone of fangorn/chiron (~3000-commit main history, all packed) inside a memory-capped OrbStack container.

Before After pack cache only After + pack-first
Total wall (30 PRs) ~10 min (extrapolated) 11.8 s 4.9 s
Per-PR mean 19500 ms 395 ms 164 ms
First call (cold cache) 28776 ms 957 ms 486 ms
Subsequent calls 15-19 s 400-500 ms 150-170 ms
Flame samples for :gen.do_call/4 ~22000 ~2200 ~38

Cumulative ~120× faster end-to-end. Page that was unusable (10 min) now loads in 5 s.

What’s left at 164ms/call

Pure zlib decompression of commit objects. Reader.inflate_prefix_size/4 and :zlib.* calls now dominate the flame graph. That’s the actual algorithmic work of unpacking each commit — unavoidable without batching/parallelism.

Further improvements possible but lower-leverage:

  • Reuse zlib state across object reads (avoid :zlib.inflateInit/inflateEnd per-object)
  • Consumer-side parallelism (Task.async_stream over PRs in Anvil’s compute_ahead_behind/2) — would cut to ~500ms wall via concurrency
  • Architectural fix using :file.pread/3 with :raw mode to bypass :prim_file for uncached cold reads

None of those are in this PR.

API surface

  • clear_pack_cache/0 — public, drops cached entries in the calling process

Verified

  • mix test — 903/903 pass (51 excluded, same as main)
  • mix format --check-formatted clean
  • mix credo --strict clean on changed file
Created Apr 29, 2026 at 07:23 UTC | Merged Apr 29, 2026 at 13:31 UTC by colechristensen cole.christensen@gmail.com