epic #215 Phase 1 + S1: walker property test + lazy object streaming #37
epic-215-batch
into main
Bundles the library-side work for Anvil epic #215 into one PR per repo. Two commits, two REQs, both backed by failure-injection proof.
What’s in here
T1 — Walker invariant property test (REQ-GIT-076)
`stream_data` generator produces random commit graphs (1-3 chains of length 1-6, optional merges, optional mode-160000 gitlinks whose SHA targets either the root commit or the current tip). For every generated graph: assert `MapSet.new(walker_pack_shas) == bfs_reachable(repo, wants)`.
Catches the entire class of walker bugs that hit production in May 2026 (the gitlink-reachability regression — 1043 reachable objects silently pruned from `fangorn/hephaestus`’s pack).
Failure-injection proof. Temporarily reverting the `mode: "160000"` head in `collect_single_tree_entry` makes the property fail on iteration 0:
``` Failed with generated values (after 0 successful runs): Clause: spec <- graph_spec() Generated: %{…, inject_gitlink?: true, gitlink_target: :self, …}
walker missed reachable SHAs: ["087267660a…", "209f7076a8…", … 8 SHAs total …] ```
Restored: 80 random iterations pass in ~100 ms.
Adds `{:stream_data, "~> 1.1", only: :test}` to deps.
S1 — Stream walker objects through writer (REQ-GIT-080)
After `collect_objects_maybe_shallow` returns the walker’s `[{type, content, sha}, …]` list, project it down to `[{type, sha}, …]` and let the original content list go out of scope (so it’s GC’d). Pipe the SHA list through a `Stream.map(fn {type, sha} -> … ObjectResolver.read(repo, sha) … end)` into a new `Writer.generate_stream_enum/4`. Pack-write phase peak heap is bounded by one object at a time.
Surface changes:
- New `Writer.generate_stream_enum/4` — accepts `Enumerable.t/0`
- explicit count. The pack format needs the count in the hashed header, so it cannot be deferred. Same byte output as the existing `generate_stream/3`.
- `generate_stream/3` becomes a thin wrapper that supplies `length/1` for list inputs. Same public contract.
- `UploadPackV2.stream_packfile_response` uses `drop_content/1` + `stream_object_contents/2` helpers to set up the lazy stream.
Measured impact (Anvil REQ-GIT-077 budget test, 10 MiB random-blob fetch, avg peak BEAM heap delta over 3 runs):
| Library version | Peak heap delta |
|---|---|
| main (eager) | 14.7 MB |
| this branch | 3.5 MB |
~4× reduction on the test fixture. Scaled to prod hephaestus (~210 MB pack), absolute savings should exceed 200 MB. The reduction is in the pack-write phase; the walk phase still allocates content (will be addressed by S4 / pass-through packed object reuse in a follow-up).
Test plan
- `mix test` — 945 tests, 0 failures (51 excluded)
- `mix format –check-formatted` clean
- `mix credo –strict` clean
- Property test demonstrated to fail on the gitlink shape when the fix is reverted
- S1 byte output verified equivalent to non-streaming via existing v2 byte-equivalence tests
Tracks
Epic Anvil #215 (REQ-GIT-076, REQ-GIT-080).