ref:03224cd34075cd4a0bb08807f7d82f3e5a5a7dc2

perf(commit-walk): pack-first ordering eliminates loose-object miss cost

After the per-process pack cache landed, profile showed ~400ms per ahead_behind call (down from 17s) — and the dominant remaining cost was loose-object existence checks. Object.read(repo, sha) was hitting :prim_file via File.read for EVERY commit in the walk, even when those SHAs are in a pack. On chiron's mostly-packed history that's 90k synchronous round-trips to a singleton GenServer for files that don't exist. Inverts the lookup order in ObjectResolver.read/2: - was: loose first, packs as fallback - now: packs first, loose as fallback Pack lookup after the cache warmup is a cached Index.lookup/2 (binary search in memory) — no syscall, no GenServer. Falling through to the loose check only happens when a SHA isn't in any pack, which is the recently-written-not-yet-packed case. Safe by content addressing: if a SHA exists both loose AND packed, both must contain byte-identical data (otherwise SHAs wouldn't match). Git itself uses pack-first ordering for the same performance reason. Verified: full suite 903/903 green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SHA: 03224cd34075cd4a0bb08807f7d82f3e5a5a7dc2
Author: Cole Christensen <cole.christensen@macmillan.com>
Date: 2026-04-29 13:09
Parents: d12a4cf
1 files changed +16 -3
Type
lib/ex_git_objectstore/object_resolver.ex +16 −3
@@ -55,16 +55,29 @@
@pack_list_key :exgo_pack_list_cache
@doc """
Read an object by SHA. Checks packs first (cheap after the per-process
pack-index cache warms up), falls back to loose objects on miss.
Why packs-first: graph-walk callers (e.g. `Graph.Fallback.do_collect/4`)
read every commit in a history. On packed repos, every loose-object
check that runs FIRST is a synchronous round-trip to Erlang's singleton
`:prim_file` GenServer for a file that doesn't exist. Pack lookup, by
contrast, is a cached `Index.lookup/2` (binary search in memory) — no
syscall, no GenServer.
Read an object by SHA, checking loose objects first, then packs.
Pack-first is also safe by content addressing: if a SHA exists both
loose AND in a pack, both must contain byte-identical data (or the
SHAs wouldn't match), so either one is correct. Git itself uses this
ordering.
"""
@spec read(Repo.t(), String.t()) :: {:ok, Object.t()} | {:error, term()}
def read(%Repo{} = repo, sha) do
case Object.read(repo, sha) do
case read_from_packs(repo, sha) do
{:ok, _} = result ->
result
{:error, :not_found} ->
read_from_packs(repo, sha)
Object.read(repo, sha)
{:error, _} = err ->
err