fangorn/ex_git_objectstore
public
ref:330a5b000540ce23b68c27d3bd2c3c87d0eab822
feat: commit-graph index — part 1 (storage + binary format + build) (#15)
PR 1 of 4 toward fangorn/ex_git_objectstore#26. Adds the foundation for an in-memory commit-graph index so that `ahead_behind`, `commits_between`, and `is_ancestor?` can run proportional to the number of commits touched, rather than the current O(max_walk) \`cat_object\`-per-step walker. Anvil's PR list/show pages have been saturating prod CPU on that walker — see fangorn/anvil#55. This PR is strictly additive: no callers yet.
## What changes
**Storage behaviour** — four new callbacks for generic side-index blobs:
- \`put_blob(config, prefix, key, data)\`
- \`get_blob(config, prefix, key)\`
- \`delete_blob(config, prefix, key)\`
- \`blob_exists?(config, prefix, key)\`
Implemented in \`Memory\`, \`Filesystem\`, and \`S3\` backends. Keys are repo-scoped slash-separated paths (e.g. \`graph/commit-graph.v1\`). All three backends reject \`..\` traversal and absolute paths.
**\`ExGitObjectstore.Graph\`** — struct + persistence:
- \`load/1\`, \`save/2\`, \`delete/1\` against blob key \`graph/commit-graph.v1\`.
- Lookup API: \`generation/2\`, \`corrected_commit_date/2\`, \`parents/2\`, \`member?/2\`, \`size/1\`.
**\`ExGitObjectstore.Graph.BinaryFormat\`** — serialize/deserialize:
\`\`\`
Header (12 B): magic \"ECG1\", u32 version=1, u32 commit_count
Fan-out (1024 B): 256 × u32 (entry i = count of OIDs with first byte ≤ i)
OID table: N × 20 raw SHA bytes, ascending
Offset table: N × u32 byte offset into the entries block
Entries (var): tree_oid (20) + gen (u32) + ccd (u64) + ctime (u64)
+ parent_count (u8) + parent_indices (u32 × pc)
\`\`\`
All integers big-endian. SHAs stored as 20-byte raw; hex at the API boundary. ~45 B/commit for typical one-parent history, plus 24 B in the two lookup tables. Output is deterministic — same graph produces identical bytes.
**\`ExGitObjectstore.Graph.Builder\`** — full scan:
- Collects tips from branches + tags, peels annotated tags to commits.
- BFS the DAG, one \`cat_object\` per reachable commit.
- Kahn's algorithm over a child-adjacency map to compute generation numbers and corrected commit dates in a single topological pass.
- Generation: roots = 1, else \`max(parent.gen) + 1\`.
- CCD: roots = \`commit_time\`, else \`max(commit_time, max(parent.ccd))\` — matches git's topologically-consistent committer date and handles clock-skewed parents.
## Tests
- 20 new storage-layer tests across the three backends (round-trip, overwrite, delete, missing key, traversal rejection, prefix isolation).
- 12 binary-format tests (round-trip for empty/linear/merge/octopus, large u64 timestamps, determinism, header magic + version + truncation rejection, fan-out correctness).
- 12 build/load/save tests (empty repo, linear history, merge commits with correct generation arithmetic, CCD clock-skew handling, multi-ref reachability, save/load equivalence, delete round-trip, unknown-sha lookups).
Full suite: **660 tests, 0 failures** (was 636 on main).
## Credo
\`mix credo --strict\` composition unchanged from main: 12 pre-existing findings, 0 new. I introduced one nesting finding in \`Builder.collect_tips/1\` and refactored it out before commit. The other 12 are pre-existing and not in this PR's scope — suggest tracking separately if we want to drive them to zero.
## What's next (not in this PR)
- **PR 2** — \`Graph.ahead_behind/3\`, \`commits_between/3\`, \`is_ancestor?/3\` on top of the in-memory graph, with property tests against the existing \`Walk\` module on fixture repos.
- **PR 3** — \`Graph.update/3\` incremental, wired into \`commit_tree/3\`, \`merge_branches/4\`, \`squash_merge/4\`, \`rebase_commits/4\`, \`cherry_pick/*\`, and the receive-pack path. Fuzz test: incremental result == full rebuild.
- **PR 4** — top-level \`ExGitObjectstore.{ahead_behind,commits_between,is_ancestor?}\` with auto-load + lazy build. This is the API Anvil will switch to (tracked in fangorn/anvil#55).
Closes nothing yet; part of fangorn/ex_git_objectstore#26.
SHA:
330a5b000540ce23b68c27d3bd2c3c87d0eab822
Author:
Anvil <noreply@anvil.fangorn.io>
Date:
2026-04-18 15:16
Parents:
aef8c4d
17 files changed
+1763
-1
| Type | ||
|---|---|---|
|
|
lib/ex_git_objectstore.ex | +9 −0 |
|
||