fangorn/ex_git_objectstore
public
ref:ed878d981a4f22a333de680e275b1a6f7b9dd184
epic #215 Phase 1 + S1: walker property test + lazy object streaming (#37)
Bundles the library-side work for Anvil epic #215 into one PR per
repo. Two commits, two REQs, both backed by failure-injection proof.
## What's in here
### T1 — Walker invariant property test (REQ-GIT-076)
\`stream_data\` generator produces random commit graphs (1-3 chains
of length 1-6, optional merges, optional mode-160000 gitlinks whose
SHA targets either the root commit or the current tip). For every
generated graph: assert
\`MapSet.new(walker_pack_shas) == bfs_reachable(repo, wants)\`.
Catches the entire class of walker bugs that hit production in
May 2026 (the gitlink-reachability regression — 1043 reachable
objects silently pruned from \`fangorn/hephaestus\`'s pack).
**Failure-injection proof.** Temporarily reverting the
\`mode: \"160000\"\` head in \`collect_single_tree_entry\` makes the
property fail on iteration 0:
\`\`\`
Failed with generated values (after 0 successful runs):
Clause: spec <- graph_spec()
Generated: %{..., inject_gitlink?: true, gitlink_target: :self, ...}
walker missed reachable SHAs: [\"087267660a...\", \"209f7076a8...\",
... 8 SHAs total ...]
\`\`\`
Restored: 80 random iterations pass in ~100 ms.
Adds \`{:stream_data, \"~> 1.1\", only: :test}\` to deps.
### S1 — Stream walker objects through writer (REQ-GIT-080)
After \`collect_objects_maybe_shallow\` returns the walker's
\`[{type, content, sha}, ...]\` list, project it down to
\`[{type, sha}, ...]\` and let the original content list go out of
scope (so it's GC'd). Pipe the SHA list through a
\`Stream.map(fn {type, sha} -> ... ObjectResolver.read(repo, sha) ...
end)\` into a new \`Writer.generate_stream_enum/4\`. Pack-write phase
peak heap is bounded by one object at a time.
Surface changes:
- **New** \`Writer.generate_stream_enum/4\` — accepts \`Enumerable.t/0\`
+ explicit count. The pack format needs the count in the hashed
header, so it cannot be deferred. Same byte output as the existing
\`generate_stream/3\`.
- \`generate_stream/3\` becomes a thin wrapper that supplies
\`length/1\` for list inputs. Same public contract.
- \`UploadPackV2.stream_packfile_response\` uses
\`drop_content/1\` + \`stream_object_contents/2\` helpers to set
up the lazy stream.
**Measured impact** (Anvil REQ-GIT-077 budget test, 10 MiB random-blob
fetch, avg peak BEAM heap delta over 3 runs):
| Library version | Peak heap delta |
|---|---|
| main (eager) | 14.7 MB |
| **this branch** | **3.5 MB** |
~4× reduction on the test fixture. Scaled to prod hephaestus
(~210 MB pack), absolute savings should exceed 200 MB. The
reduction is in the pack-write phase; the walk phase still
allocates content (will be addressed by S4 / pass-through packed
object reuse in a follow-up).
## Test plan
- [x] \`mix test\` — 945 tests, 0 failures (51 excluded)
- [x] \`mix format --check-formatted\` clean
- [x] \`mix credo --strict\` clean
- [x] Property test demonstrated to fail on the gitlink shape
when the fix is reverted
- [x] S1 byte output verified equivalent to non-streaming via
existing v2 byte-equivalence tests
## Tracks
Epic Anvil #215 (REQ-GIT-076, REQ-GIT-080).
SHA:
ed878d981a4f22a333de680e275b1a6f7b9dd184
Author:
Anvil <noreply@anvil.fangorn.io>
Date:
2026-05-14 18:10
Parents:
81b6f84
5 files changed
+450
-10
| Type | ||
|---|---|---|
|
|
mix.exs | +2 −1 |
|
||
|
|
mix.lock | +1 −0 |
|
||