fangorn/ex_git_objectstore
public
ref:96509fa11f87f61bf0df582a4ded83c09a3f79b0
perf(delta): unroll copy-arg decode — 90× faster, profile-driven (#30)
Sub-issue under fangorn/anvil#153 umbrella, validated this time with an actual profile.
## What the profile said
Sample-based stack profiler captured ~48k samples on `Pack.Reader.parse/2` applied to a real ovs pack (96 MB, 134k objects, 108k deltas). The dominant hot path was NOT the broken offset cache (#29), NOR the find_compressed_length binary search I'd theorized as the "actual" bottleneck — it was `Pack.Delta.read_if_bit/3`:
| samples | path |
|---|---|
| 18,471 | apply > apply_instructions > read_copy_size |
| 14,628 | apply > apply_instructions > read_copy_offset |
| 1,625 | read_copy_offset > read_if_bit |
| 1,172 | read_copy_size > read_if_bit |
~74% of total CPU. Each copy command made 7 sequential pattern matches via `read_if_bit/3` (4 offset + 3 size), AND the false branch reconstructed `<<byte, rest::binary>>` instead of returning the original — tens of millions of redundant binary allocations across ovs's ~5 million delta instructions.
## Fix
`apply_instructions/3`'s copy clause now calls `decode_copy_args(cmd, data)` that consumes exactly the right number of bytes in ONE binary pattern match. 128 specialized clauses (16 offset bitmaps × 8 size bitmaps) generated at compile time via macros so the BEAM compiler picks the matching clause in a single dispatch. No more `read_if_bit`, no more 7-step splits, no more no-op binary reconstruction.
## Measured improvement
Same `Pack.Reader.parse/2` on the same 96 MB ovs pack:
| Before | After |
|---|---|
| >27 min, never completed | **18.4 s** |
In the post-fix profile, `Pack.Delta` dropped from ~37k samples (~74%) to ~270 samples (~12% of remaining). New top is `decompress_data → probe_compressed_length` (the binary-search bottleneck I'd theorized originally) — that's the next sub-issue tracked under #153.
## Test plan
- [x] 928/0 across full ex_git_objectstore suite.
- [x] `mix format --check-formatted` clean.
- [x] `mix dialyzer` clean.
- [ ] Live ovs push test against prod once this and the anvil mix.lock bump deploy. Expectation: parse phase drops from 'never finishes' to ~20 s. Total push: ~2 min.
## Memory note
Peak RSS during the 18 s parse was ~3.5 GB on this pack — GC churn plus the buffered resolved entries list. Streaming parse-and-store is still the next big sub-issue; this PR just makes the existing path actually finish in finite time.
SHA:
96509fa11f87f61bf0df582a4ded83c09a3f79b0
Author:
Anvil <noreply@anvil.fangorn.io>
Date:
2026-05-06 15:58
Parents:
6994d00
1 files changed
+98
-39
| Type | ||
|---|---|---|
|
|
lib/ex_git_objectstore/pack/delta.ex | +98 −39 |
|
||