fangorn/ex_git_objectstore
public
Myers diff uses O(D²) memory and Map-as-V — fix linear-space + counters #59
Links
No links yet.
Summary
ExGitObjectstore.Diff.Myers (lib/ex_git_objectstore/diff/myers.ex) uses two structurally bad memory patterns that together OOM the BEAM on large diffs. Surfaced as the root cause of Anvil chiron PR #68 (~OOM tracked in fangorn/anvil#81).
Reproducer
Anvil.Git.diff for chiron PR #68, profiled inside a 6 GB-capped container by lib/mix/tasks/anvil.profile_diff_memory.ex:
- BEAM allocators climb to 16 GB (cgroup capped at 6, sawtooth thrashing for ~4 minutes)
- 75% of CPU in
Myers.process_diag/Myers.snake/Map.get - 38% on
Map.get/3alone (V-table inner loop) - Phase A (
Git.diff) never completes
Flame graph + memory TSV captured under priv/profiles/pr68/ in the anvil repo.
Root causes
A. Trace keeps every V table alive — O(D²) memory
myers.ex:67-78 accumulates one V map per d iteration into a list, then materializes it into a tuple for backtracking:
```elixir find_d(…, d + 1, v_new, [v_new | trace]) … Enum.reverse([v_new | trace]) ```
After step d, V has 2d+1 entries. Sum over the trace: (D+1)² total map entries kept alive simultaneously. At BEAM’s ~80 B/Map-node, a file with D=10k edits costs ~8 GB just for the V trace.
This is the classical Myers naive variant. Fix: linear-space refinement from Myers 1986 §4b — divide-and-conquer at the middle snake, recursing on the two halves. Memory becomes O(N+M) regardless of D.
B. V is a Map, not a contiguous array
myers.ex:51, 85-93 — V is %{1 => 0} with Map.put / Map.get. The diagonal index k is a contiguous integer range [-d, +d] — a perfect fit for a tuple/array, not a hash map.
- ~80 B per Map entry vs ~16 B per
:countersslot (~5× per-slot cost) Map.get/putis O(log n);:counters.get/putis O(1)
Only safe to use :counters after Fix A — once only one V is alive at a time.
Acceptance criteria
Diff.diff_blobs/4for synthetic 10k-line × 30%-divergent input completes under 200 MB process heap peak.- No
Map.getin the diff inner loop (verified by flame graph). - All existing diff tests pass byte-for-byte (output must be unchanged — only memory profile changes).
- Add a stress test that asserts the memory bound.
Plan
One PR, two commits:
- Replace
ses/find_d/backtrackwith linear-space middle-snake recursion. - Replace
MapV with:counters(depends on #1).
Closes Anvil chiron PR #68 OOM (tracked in fangorn/anvil#81).