ref:44a17299073eb06719b0ccbf12678b75b33525d2

feat(upload-pack): streaming v2 fetch response — no full-pack buffering

Adds a streaming variant for protocol v2 fetches so the entire packfile no longer materializes in BEAM process heap before being sent. Before, building a 161 MB pack response held three full-pack-sized binaries (raw pack + sideband-wrapped pack + concatenated final response) plus the object list, peaking at ~1.8 GB transient heap and contributing to OOM-kills on the host (Anvil prod, 2026-05-11 02:50). What's new (additive, no API breaks): * Writer.generate_stream/2 and /3 — invoke a write callback for each pack chunk (header, one entry per object, trailing SHA-1). SHA-1 is computed incrementally via :crypto.hash_update/2 so no intermediate full-pack binary exists. /3 threads an accumulator through each write for stateful sinks. * PktLine.encode_sideband_frame/2 + max_sideband_data/0 — single-frame encoder for streaming callers. encode_sideband/2 is unchanged. * Protocol.SidebandWriter — re-chunks arbitrary-sized writes into spec-compliant sideband-1 frames (≤ 65515 bytes per frame). Buffer held as iodata for O(1) amortized appends. * UploadPackV2.feed/3 — streaming counterpart to feed/2. ls-refs and multi-round ack responses still arrive as a single write_fn call; packfile responses stream through the sideband chunker. Tests: * Writer: streamed bytes are byte-identical to Writer.generate/1 for small and many-mixed-object pack inputs. /3 accumulator threading verified. * SidebandWriter: small writes coalesce, oversized writes split into spec-compliant frames, exact-boundary writes drain cleanly. * UploadPackV2: ls-refs, multi-round acks, and full clone responses are byte-identical via feed/3 vs feed/2. A 70 KiB-blob fetch emits multiple chunks (not one giant binary) with every chunk ≤ 65520. Existing feed/2 path is untouched; consumers migrate at their own pace. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SHA: 44a17299073eb06719b0ccbf12678b75b33525d2
Author: CI <ci@anvil.test>
Date: 2026-05-11 14:53
Parents: 0ac0140
7 files changed +703 -1
Type
lib/ex_git_objectstore/pack/writer.ex +54 −0
@@ -63,6 +63,60 @@
end
@doc """
Generate a packfile by streaming bytes through a write callback.
`write_fn` is invoked with each piece of the pack as it's produced:
first the 12-byte pack header, then one zlib-compressed entry per
object, then the trailing 20-byte SHA-1 checksum. SHA-1 is computed
incrementally so no intermediate full-pack binary is held in memory —
peak heap is bounded by the largest single object.
Two arities are exposed:
* `generate_stream/2` — `write_fn` is `(binary -> any)`, side-effecting.
Convenient for tests / simple sinks.
* `generate_stream/3` — `write_fn` is `(binary, acc -> acc)`, threading
caller state through each chunk. Used by the protocol layer so a
sideband chunker can accumulate buffered bytes between calls.
Returns `{pack_sha, count}` for `/2`, `{pack_sha, count, final_acc}` for `/3`.
"""
@spec generate_stream([object_entry()], (binary() -> any())) ::
{String.t(), non_neg_integer()}
def generate_stream(objects, write_fn) when is_list(objects) and is_function(write_fn, 1) do
{sha, count, nil} =
generate_stream(objects, nil, fn bytes, _acc ->
write_fn.(bytes)
nil
end)
{sha, count}
end
@spec generate_stream([object_entry()], acc, (binary(), acc -> acc)) ::
{String.t(), non_neg_integer(), acc}
when acc: var
def generate_stream(objects, acc, write_fn)
when is_list(objects) and is_function(write_fn, 2) do
count = length(objects)
header = <<@pack_signature, @pack_version::unsigned-big-32, count::unsigned-big-32>>
hasher = :crypto.hash_init(:sha)
acc = write_fn.(header, acc)
hasher = :crypto.hash_update(hasher, header)
{hasher, acc} =
Enum.reduce(objects, {hasher, acc}, fn {type, content, _sha}, {h, a} ->
entry = IO.iodata_to_binary(encode_entry(type_to_num(type), content))
a = write_fn.(entry, a)
{:crypto.hash_update(h, entry), a}
end)
checksum = :crypto.hash_final(hasher)
acc = write_fn.(checksum, acc)
{Base.encode16(checksum, case: :lower), count, acc}
end
@doc """
Generate both a packfile and its .idx v2 index.
Returns `{pack_data, idx_data, pack_sha}`.
"""