feat(upload-pack): streaming v2 fetch response — no full-pack buffering #32

merged colechristensen cole.christensen@gmail.com wants to merge stream-pack-response into main
No CI

Summary

Add a streaming variant for protocol v2 fetch responses so the entire packfile no longer materializes in BEAM process heap before being sent.

Tracking: Anvil #214 (and bundled with Anvil #213 / Anvil PR #135 — the SSH window fix is useless without this fix and vice versa).

Why

Building a 161 MB pack via the current `UploadPackV2.feed/2` holds three full-pack-sized binaries transiently:

  1. `pack_data` returned from `Writer.generate/1`
  2. `sideband_data` after `PktLine.encode_sideband(1, pack_data)`
  3. `response` after `IO.iodata_to_binary([ack, shallow, hdr, sideband_data, flush])`

Plus the object list. After Erlang refc-binary fragmentation, peak heap reaches ~10× the pack size. The 2026-05-11 02:50 prod BEAM OOM-kill (verified via dmesg + Anvil.Perf.MemoryMonitor logs) is the consequence: a single fetch pushes a 3.8 GiB no-swap host past available RAM.

Per the project performance protocol, this is a per-unit cost in a shared primitive — every fetch on every repo pays it.

What’s in this PR

All additive — no behaviour changes to `feed/2` or `Writer.generate/1`.

`Writer.generate_stream/2` and `/3`

Callback-based pack writer. Invokes `write_fn` with each piece of the pack as it’s produced: 12-byte header → one zlib-compressed entry per object → trailing 20-byte SHA-1 checksum. SHA-1 is computed incrementally via `:crypto.hash_update/2` so no full-pack binary ever exists.

`/3` threads an accumulator through each write — needed by the sideband chunker downstream.

`PktLine.encode_sideband_frame/2` + `max_sideband_data/0`

Single-frame encoder for streaming callers; existing `encode_sideband/2` (which materializes everything) is unchanged.

`ExGitObjectstore.Protocol.SidebandWriter` (new)

Re-chunks arbitrary-sized writes into spec-compliant sideband-1 frames (≤ 65515 bytes per frame). Buffer held as iodata for O(1) amortized append; only materializes to a binary when draining a full-size frame.

`UploadPackV2.feed/3`

Streaming counterpart to `feed/2`. ls-refs and multi-round ack responses still arrive as a single `write_fn` call (they’re small); packfile responses stream the prefix once, then push pack bytes through the sideband chunker, then write the trailing flush.

Tests

42 new assertions across three suites. All pass; full repo suite (942 tests) green.

  • Writer streaming: streamed bytes are byte-identical to `Writer.generate/1` output for single-blob, many-mixed-object, and reader-roundtrip cases. `/3` accumulator threading verified.
  • SidebandWriter: small writes coalesce into one frame; oversized writes split into multiple spec-compliant frames; exact-boundary writes drain cleanly without a partial frame.
  • UploadPackV2 feed/3: ls-refs, multi-round acks, and full clone fetch responses are byte-identical to `feed/2`. A 70 KiB-blob fetch emits multiple chunks (not one giant binary), every chunk ≤ 65520.

Memory impact

Before (per the existing `build_packfile_response/6`): peak heap ≈ pack size × 3 + object list, fragmented by refc binaries.

After (`feed/3` path): peak heap is bounded by the largest single zlib-compressed object plus a single sideband frame (≤ 65515 bytes). For Anvil’s prod fetch of fangorn/hephaestus (161 MB pack, 1445 objects, largest object well under 1 MB), this should drop the ~1.95 GB transient peak to well under 100 MB.

Consumer migration

Anvil’s SSH CLI will adopt `feed/3` in fangorn/anvil PR #135. Other consumers continue using `feed/2` until they want to opt in.

Acceptance

  • Streamed pack output is byte-identical to non-streaming for matched inputs.
  • Sideband-1 frames stay within spec (≤ 65515 bytes per frame).
  • Existing tests + new tests all pass (942 tests, 0 failures).
  • No new `mix credo –strict` findings on the touched files.

Out of scope

  • Pack rebuilt-from-scratch caching (Anvil #214 mentions; separate issue).
  • Pack-build CPU cost (~16s for 161 MB pack); the algorithmic improvement is a separate PR.
Created May 11, 2026 at 15:03 UTC | Merged May 12, 2026 at 00:43 UTC by colechristensen cole.christensen@gmail.com