feat: commit-graph index — part 1 (storage + binary format + build) #15

merged colechristensen cole.christensen@gmail.com wants to merge feat/commit-graph-index-part1 into main
No CI

PR 1 of 4 toward fangorn/ex_git_objectstore#26. Adds the foundation for an in-memory commit-graph index so that ahead_behind, commits_between, and is_ancestor? can run proportional to the number of commits touched, rather than the current O(max_walk) `cat_object`-per-step walker. Anvil’s PR list/show pages have been saturating prod CPU on that walker — see fangorn/anvil#55. This PR is strictly additive: no callers yet.

What changes

Storage behaviour — four new callbacks for generic side-index blobs:

  • `put_blob(config, prefix, key, data)`
  • `get_blob(config, prefix, key)`
  • `delete_blob(config, prefix, key)`
  • `blob_exists?(config, prefix, key)`

Implemented in `Memory`, `Filesystem`, and `S3` backends. Keys are repo-scoped slash-separated paths (e.g. `graph/commit-graph.v1`). All three backends reject `..` traversal and absolute paths.

`ExGitObjectstore.Graph` — struct + persistence:

  • `load/1`, `save/2`, `delete/1` against blob key `graph/commit-graph.v1`.
  • Lookup API: `generation/2`, `corrected_commit_date/2`, `parents/2`, `member?/2`, `size/1`.

`ExGitObjectstore.Graph.BinaryFormat` — serialize/deserialize:

``` Header (12 B): magic "ECG1", u32 version=1, u32 commit_count Fan-out (1024 B): 256 × u32 (entry i = count of OIDs with first byte ≤ i) OID table: N × 20 raw SHA bytes, ascending Offset table: N × u32 byte offset into the entries block Entries (var): tree_oid (20) + gen (u32) + ccd (u64) + ctime (u64) + parent_count (u8) + parent_indices (u32 × pc) ```

All integers big-endian. SHAs stored as 20-byte raw; hex at the API boundary. ~45 B/commit for typical one-parent history, plus 24 B in the two lookup tables. Output is deterministic — same graph produces identical bytes.

`ExGitObjectstore.Graph.Builder` — full scan:

  • Collects tips from branches + tags, peels annotated tags to commits.
  • BFS the DAG, one `cat_object` per reachable commit.
  • Kahn’s algorithm over a child-adjacency map to compute generation numbers and corrected commit dates in a single topological pass.
  • Generation: roots = 1, else `max(parent.gen) + 1`.
  • CCD: roots = `commit_time`, else `max(commit_time, max(parent.ccd))` — matches git’s topologically-consistent committer date and handles clock-skewed parents.

Tests

  • 20 new storage-layer tests across the three backends (round-trip, overwrite, delete, missing key, traversal rejection, prefix isolation).
  • 12 binary-format tests (round-trip for empty/linear/merge/octopus, large u64 timestamps, determinism, header magic + version + truncation rejection, fan-out correctness).
  • 12 build/load/save tests (empty repo, linear history, merge commits with correct generation arithmetic, CCD clock-skew handling, multi-ref reachability, save/load equivalence, delete round-trip, unknown-sha lookups).

Full suite: 660 tests, 0 failures (was 636 on main).

Credo

`mix credo –strict` composition unchanged from main: 12 pre-existing findings, 0 new. I introduced one nesting finding in `Builder.collect_tips/1` and refactored it out before commit. The other 12 are pre-existing and not in this PR’s scope — suggest tracking separately if we want to drive them to zero.

What’s next (not in this PR)

  • PR 2 — `Graph.ahead_behind/3`, `commits_between/3`, `is_ancestor?/3` on top of the in-memory graph, with property tests against the existing `Walk` module on fixture repos.
  • PR 3 — `Graph.update/3` incremental, wired into `commit_tree/3`, `merge_branches/4`, `squash_merge/4`, `rebase_commits/4`, `cherry_pick/*`, and the receive-pack path. Fuzz test: incremental result == full rebuild.
  • PR 4 — top-level `ExGitObjectstore.{ahead_behind,commits_between,is_ancestor?}` with auto-load + lazy build. This is the API Anvil will switch to (tracked in fangorn/anvil#55).

Closes nothing yet; part of fangorn/ex_git_objectstore#26.

Created Apr 18, 2026 at 14:44 UTC | Merged Apr 18, 2026 at 15:16 UTC by colechristensen cole.christensen@gmail.com