ref:330a5b000540ce23b68c27d3bd2c3c87d0eab822

feat: commit-graph index — part 1 (storage + binary format + build) (#15)

PR 1 of 4 toward fangorn/ex_git_objectstore#26. Adds the foundation for an in-memory commit-graph index so that `ahead_behind`, `commits_between`, and `is_ancestor?` can run proportional to the number of commits touched, rather than the current O(max_walk) \`cat_object\`-per-step walker. Anvil's PR list/show pages have been saturating prod CPU on that walker — see fangorn/anvil#55. This PR is strictly additive: no callers yet. ## What changes **Storage behaviour** — four new callbacks for generic side-index blobs: - \`put_blob(config, prefix, key, data)\` - \`get_blob(config, prefix, key)\` - \`delete_blob(config, prefix, key)\` - \`blob_exists?(config, prefix, key)\` Implemented in \`Memory\`, \`Filesystem\`, and \`S3\` backends. Keys are repo-scoped slash-separated paths (e.g. \`graph/commit-graph.v1\`). All three backends reject \`..\` traversal and absolute paths. **\`ExGitObjectstore.Graph\`** — struct + persistence: - \`load/1\`, \`save/2\`, \`delete/1\` against blob key \`graph/commit-graph.v1\`. - Lookup API: \`generation/2\`, \`corrected_commit_date/2\`, \`parents/2\`, \`member?/2\`, \`size/1\`. **\`ExGitObjectstore.Graph.BinaryFormat\`** — serialize/deserialize: \`\`\` Header (12 B): magic \"ECG1\", u32 version=1, u32 commit_count Fan-out (1024 B): 256 × u32 (entry i = count of OIDs with first byte ≤ i) OID table: N × 20 raw SHA bytes, ascending Offset table: N × u32 byte offset into the entries block Entries (var): tree_oid (20) + gen (u32) + ccd (u64) + ctime (u64) + parent_count (u8) + parent_indices (u32 × pc) \`\`\` All integers big-endian. SHAs stored as 20-byte raw; hex at the API boundary. ~45 B/commit for typical one-parent history, plus 24 B in the two lookup tables. Output is deterministic — same graph produces identical bytes. **\`ExGitObjectstore.Graph.Builder\`** — full scan: - Collects tips from branches + tags, peels annotated tags to commits. - BFS the DAG, one \`cat_object\` per reachable commit. - Kahn's algorithm over a child-adjacency map to compute generation numbers and corrected commit dates in a single topological pass. - Generation: roots = 1, else \`max(parent.gen) + 1\`. - CCD: roots = \`commit_time\`, else \`max(commit_time, max(parent.ccd))\` — matches git's topologically-consistent committer date and handles clock-skewed parents. ## Tests - 20 new storage-layer tests across the three backends (round-trip, overwrite, delete, missing key, traversal rejection, prefix isolation). - 12 binary-format tests (round-trip for empty/linear/merge/octopus, large u64 timestamps, determinism, header magic + version + truncation rejection, fan-out correctness). - 12 build/load/save tests (empty repo, linear history, merge commits with correct generation arithmetic, CCD clock-skew handling, multi-ref reachability, save/load equivalence, delete round-trip, unknown-sha lookups). Full suite: **660 tests, 0 failures** (was 636 on main). ## Credo \`mix credo --strict\` composition unchanged from main: 12 pre-existing findings, 0 new. I introduced one nesting finding in \`Builder.collect_tips/1\` and refactored it out before commit. The other 12 are pre-existing and not in this PR's scope — suggest tracking separately if we want to drive them to zero. ## What's next (not in this PR) - **PR 2** — \`Graph.ahead_behind/3\`, \`commits_between/3\`, \`is_ancestor?/3\` on top of the in-memory graph, with property tests against the existing \`Walk\` module on fixture repos. - **PR 3** — \`Graph.update/3\` incremental, wired into \`commit_tree/3\`, \`merge_branches/4\`, \`squash_merge/4\`, \`rebase_commits/4\`, \`cherry_pick/*\`, and the receive-pack path. Fuzz test: incremental result == full rebuild. - **PR 4** — top-level \`ExGitObjectstore.{ahead_behind,commits_between,is_ancestor?}\` with auto-load + lazy build. This is the API Anvil will switch to (tracked in fangorn/anvil#55). Closes nothing yet; part of fangorn/ex_git_objectstore#26.
SHA: 330a5b000540ce23b68c27d3bd2c3c87d0eab822
Author: Anvil <noreply@anvil.fangorn.io>
Date: 2026-04-18 15:16
Parents: aef8c4d
17 files changed +1763 -1
Type
lib/ex_git_objectstore.ex +9 −0
@@ -43,6 +43,15 @@
* `merge_base/3` — lowest common ancestor of two commits
* `ancestor?/3` — true if A is an ancestor of B
## Commit-graph index
`ExGitObjectstore.Graph` provides an optional persisted commit-graph
index with topological generation numbers and corrected commit dates.
Once built and saved (`Graph.build/1`, `Graph.save/2`), it is loaded
wholesale into memory for fast ancestry / ahead-behind queries without
per-commit object reads. See that module and the `Graph.BinaryFormat`
moduledoc for details.
"""
alias ExGitObjectstore.{Merge, Object, ObjectResolver, Ref, Repo, Walk}