perf(upload-pack): stream objects through writer one at a time (epic #215 S1) #36

closed colechristensen cole.christensen@gmail.com wants to merge perf/stream-walker-into-writer into main
No CI

Phase 2 S1 from Anvil #215. Backed by a measured before/after.

What changes

After `UploadPackV2.collect_objects_maybe_shallow` returns the walker’s `[{type, content, sha}, …]` list, project it down to just `[{type, sha}, …]` and let the original content list go out of scope. Pipe the SHA list through `Stream.map(fn {type, sha} -> … ObjectResolver.read(repo, sha) … end)` and into the new `Writer.generate_stream_enum/4`. Pack-write phase peak heap is now bounded by one object at a time.

Surface changes

  • New `Writer.generate_stream_enum/4` — accepts any `Enumerable.t/0` of `{type, content, sha}` tuples plus an explicit `count`. The pack format requires the count in the hashed header so we can’t defer it; this is why we walk first to get the SHA list (which is small — 40 bytes/SHA) before streaming content.
  • `Writer.generate_stream/3` is now a thin wrapper around `generate_stream_enum/4` that supplies `length/1` for list inputs. Same public contract, same byte output.
  • `UploadPackV2.stream_packfile_response` uses the new functions via two new helpers: `drop_content/1` and `stream_object_contents/2`.

Measured impact (Anvil REQ-GIT-077 budget test, 10 MiB random-blob fetch)

Avg peak BEAM heap delta over 3 runs each (`Anvil.Test.MemoryProbe` sampling `:erlang.memory(:total)` every 50 ms):

Library version Peak heap delta over baseline
main (eager: walker → list → writer) 14.7 MB
this branch (objects streamed lazily) 3.5 MB

That’s a ~4× reduction on the test fixture. Scaled to prod hephaestus (~210 MB pack), absolute savings should exceed 200 MB. The reduction is in the pack-write phase, where the original code held the full materialized object list alongside the in-flight pack bytes. The walk phase still materializes content — that’s what S4 (pass-through packed object reuse) will address.

What this does NOT yet do

  • Walk-phase peak is unchanged. The walker still allocates every commit/tree/blob’s content during traversal. S4 will reuse the on-disk pack’s already-zlib-compressed bytes so the walk never re-encodes content from scratch.
  • I/O is doubled on the blob path: walker reads → drops content → stream re-reads. Fast on local disk; expensive on a cold S3 backend. S4 also addresses this by reading once.

I considered going straight to a fully-lazy walker via Stream.unfold (no eager walk at all) — that would compound to a ~6-8× total peak reduction. But that’s a bigger refactor (walker uses recursive mutual functions that don’t lazy-evaluate cleanly), and the measured 4× win here is worth shipping now while we have the T1 property test as a safety net. The lazy walker is queued as a follow-up after S4 lands.

Tests

  • All 945 existing library tests pass — critically including the v2 byte-equivalence tests between `feed/2` (eager) and `feed/3` (streaming). If `generate_stream_enum` output didn’t byte-match the original `generate_stream`, those would fail.
  • Walker invariant property test (REQ-GIT-076) still green.

Tracks

Epic #215 / REQ-GIT-080.

Created May 14, 2026 at 15:26 UTC