ref:d5d4b73f3326b7bc6b5d2417ec09ffc4a7c9ce68

perf(receive-pack): drop the throttled-SHA cheat — feed absorb-only, flush verifies (#28)

Honest replacement for the 4 MB throttled SHA-1 check landed in #27. Per docs/PERFORMANCE.md, that throttle was the scheduling-around-the-cost cheat — same big-O as the original O(N²) bug, just a bigger denominator. ## What changed `feed/2` is now absorb-only: appends bytes to an iolist body, maintains a 20-byte lookahead, never verifies, never decides completeness, never transitions out of `:pack`. Per-chunk cost: O(1) amortized. Total streaming cost: O(N). `flush/1` is the explicit end-of-stream signal. It hands the materialized buffer to `Pack.Reader.parse/2` once. That parse pass already runs the SHA-1 trailer verification — so the previous ReceivePack-level check was both redundant AND quadratic. Total verify work: one O(N) hash + one parse, period. Errors (corrupted trailer, truncated body, malformed entries) now flow through Pack.Reader's `{:error, reason}` and surface as a proper `unpack <reason>` line in report-status. State transitions to `:done` so the channel doesn't hang on a closed connection. ## Caller contract change `feed/2` no longer auto-finalizes. Every transport must call flush at end-of-stream: - **SSH** (`Anvil.SSH.CLI`) — already wired in anvil#127's `{:eof, channel_id}` handler. - **HTTP** (`AnvilWeb.GitHttpController.receive_pack`) — needs flush after `read_full_body`. Bundled with the mix.lock bump in the companion anvil PR. - **Test transports** — `git_daemon.ex` flushes on socket close; the LFS HTTP adapter flushes after one-shot body read. Both updated here. ## Test plan - [x] 928/0 in the full ex_git_objectstore suite. - [x] All test files using direct `ReceivePack.feed` updated to follow the new feed-then-flush contract. - [x] New assertions in `protocol_interop_test` pin the corrupted-trailer + truncated-pack behavior: state transitions to `:done` with `{:error, _}` and reports a structured `unpack` failure, rather than sitting in `:pack` until the channel closes. - [x] `mix format --check-formatted` clean. - [x] `mix dialyzer` clean. ## Memory note Pack body is still fully buffered in memory until flush. Tracked in anvil#153 — the path forward is a streaming Pack.Reader that consumes bytes and emits parsed objects incrementally. For ovs (~106 MB pack) full-buffer fits comfortably under the 3.82 GiB container cap.
SHA: d5d4b73f3326b7bc6b5d2417ec09ffc4a7c9ce68
Author: Anvil <noreply@anvil.fangorn.io>
Date: 2026-05-06 05:24
Parents: 9d4658e
8 files changed +200 -202
Type
test/support/git_daemon.ex +23 −12
@@ -253,7 +253,14 @@
defp drive_receive_recv(client, state) do
case :gen_tcp.recv(client, 0, 15_000) do
{:ok, data} -> drive_receive_feed(client, state, data)
{:error, _} -> :ok
{:ok, data} ->
drive_receive_feed(client, state, data)
{:error, _} ->
# Client closed its side — that's end-of-stream. Flush so the
# state machine actually finalizes (the absorb-only feed/2
# contract leaves it in :pack until somebody calls flush).
_ = drive_receive_flush(client, state)
:ok
end
end
@@ -264,6 +271,16 @@
drive_receive(client, new_state)
end
# When the TCP recv loop returns an error/timeout (i.e. the client
# closed its side), we treat that as end-of-stream and flush. Real
# SSH/HTTP layers do the same on their respective EOF signals; the
# test daemon just polls until socket close.
defp drive_receive_flush(client, state) do
{response, new_state} = ReceivePack.flush(state)
if byte_size(response) > 0, do: :ok = :gen_tcp.send(client, response)
new_state
end
# --- shared ---
defp read_one_pkt(client) do
@@ -371,17 +388,11 @@
end
defp drive_receive_state(body, state) do
# HTTP delivers the whole request body in one shot — feed once,
# then flush since the body has a known length.
{first, state} = ReceivePack.feed(state, body)
if ReceivePack.done?(state) do
{second, state} = ReceivePack.flush(state)
{first <> second, state}
{first, state}
else
# Some state machines accept the pack across multiple feeds; for
# HTTP the whole body arrives at once, so a second empty feed is
# enough to let maybe_process_pack run.
{second, state} = ReceivePack.feed(state, <<>>)
{first <> second, state}
end
end
# --- HTTP/1.1 mini-parser ---
test/support/lfs_http_adapter.ex +3 −1
@@ -91,7 +91,9 @@
repo = build_repo(conn, repo_id)
{:ok, conn, body} = read_full_body(conn)
{_advert, state} = ReceivePack.init(repo)
{response, _state} = ReceivePack.feed(state, body)
{feed_resp, state} = ReceivePack.feed(state, body)
{flush_resp, _state} = ReceivePack.flush(state)
response = feed_resp <> flush_resp
conn
|> put_resp_content_type("application/x-git-receive-pack-result")