ref:469c525b9be8e16c0427d548900b70c2ef438c17

perf(diff): swap Myers V table from Map to :atomics

After the linear-space algorithm landed, Phase A on chiron PR #68 took 20s wall-clock — too slow for a page mount. 21k garbage collections during that 20s, almost all from `Map.put` churn in the inner loop. `:atomics` is the right primitive for V here: fixed-size array of signed 64-bit ints, mutable in place, lives off the BEAM term heap, no GC pressure. The diagonal index k maps cleanly onto a 1-indexed slot. Switched all V access from Map.get/put to :atomics.get/put. Sentinel -1 ("not yet reached") still works since :atomics defaults to signed. The arrays are explicitly initialized to -1 at bisect entry (atomics default to 0, which collides with the legitimate "reached at x=0" state). Per-bisect init is O(N+M) — same magnitude as the algorithm's own work, no asymptotic change. Mutability simplifies the recursion too: forward_sweep and reverse_sweep no longer thread updated v1/v2 through their return values. They mutate in place and return only the bounds. Bench (10k lines, ~33% diff): wall 12.2s → 4.7s (2.6× faster) peak heap 9.1MB → 4.5MB (2× lower) GCs ~21k → ~1k (~20× fewer) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SHA: 469c525b9be8e16c0427d548900b70c2ef438c17
Author: Cole Christensen <cole.christensen@macmillan.com>
Date: 2026-04-29 04:57
Parents: dc2d28f
1 files changed +82 -147
Type
lib/ex_git_objectstore/diff/myers.ex +82 −147
@@ -24,24 +24,19 @@
Memory: O(N+M) total. Each recursion frame holds two V tables of size
~max_d (= (n+m+1)/2 in that frame) for the duration of one bisect; once
bisect returns, the V tables are GC'd before recursing. Recursion depth
is O(log(N+M)) on average.
bisect returns, the V tables are freed. Recursion depth is O(log(N+M))
on average.
V tables are `:atomics` arrays (mutable in place, off the BEAM term heap,
O(1) get/put, no GC pressure). The original implementation used `Map` for
V; correctness was identical but the Map.put churn drove GC count to
~1000/sec on real inputs and stretched a 200-millisecond computation into
This module is a careful Elixir translation of Google diff_match_patch's
`diff_bisect` (https://github.com/google/diff-match-patch), which is
itself a faithful port of Myers §4b. The C reference is git xdiff's
a 20-second one.
`xdl_split` / `xdl_recs_cmp` (https://github.com/git/git/blob/master/xdiff/xdiffi.c).
Translation reference: Google diff_match_patch's `diff_bisect`
(https://github.com/google/diff-match-patch), itself a faithful port of
## Why we needed this
The previous hand-rolled implementation kept every V-table from every `d`
iteration in a list (O(D²) memory) and used a `Map` for V (O(log n) per
access). On large inputs the BEAM allocator climbed into the GBs and
cgroup-killed under load. Stdlib's `List.myers_difference/2` is also
naive Myers (one path-with-suffix per diagonal kept alive simultaneously
— O(D × max(D, N+M)) memory) and exhibits the same problem. This module
is the proper linear-space variant, the same algorithm git uses by
Myers §4b. Cross-checked against git xdiff's `xdl_split`
(https://github.com/git/git/blob/master/xdiff/xdiffi.c).
default in its `xdiff` library.
"""
@type edit :: {:eq, term()} | {:ins, term()} | {:del, term()}
@@ -123,61 +118,41 @@
# ── Bisect — find the middle-snake split point ────────────────────────
#
# Translation of diff_match_patch's diff_bisect (Python). Returns
# `{x, y}` in ABSOLUTE coordinates (within the original a/b) such that
# the optimal edit script from `a[a_lo..a_hi)` to `b[b_lo..b_hi)` passes
# through (x, y) in the middle of the edit graph.
#
# V tables are stored as Maps keyed by k_offset = v_offset + k. Sentinel
# V tables are `:atomics` arrays of signed 64-bit integers. Index range
# [1, 2*max_d + 1] (atomics is 1-indexed). The diagonal-to-index mapping
# is `index = k_offset + 1` where `k_offset = v_offset + k`. Sentinel
# `-1` means "not yet reached" — distinct from "reached at x=0".
# `-1` means "not yet reached" (distinct from "reached at x=0"). Map is
# used for clarity; could be swapped for `:atomics` for speed once
# correctness is established.
defp bisect(a, b, a_lo, a_hi, b_lo, b_hi) do
n = a_hi - a_lo
m = b_hi - b_lo
max_d = div(n + m + 1, 2)
v_offset = max_d
size = 2 * max_d + 1
v1 = :atomics.new(size, signed: true)
v2 = :atomics.new(size, signed: true)
fill_neg1(v1, size)
# Init: V_f[1] = 0 means "forward path on diagonal 1 is at x=0 (start)".
# Same for V_r. All other entries default to -1 (not reached).
v1 = %{(v_offset + 1) => 0}
fill_neg1(v2, size)
v2 = %{(v_offset + 1) => 0}
# Myers init: V[1] = 0
vput(v1, v_offset + 1, 0)
vput(v2, v_offset + 1, 0)
delta = n - m
front? = rem(delta, 2) != 0
bisect_d(a, b, a_lo, b_lo, n, m, max_d, v_offset, delta, front?, v1, v2, 0, 0, 0, 0, 0)
end
# bisect_d/17: search depth d. For each d, walk the forward path one step
# bisect_d: search depth d. For each d, walk the forward path one step,
# then the reverse path one step. Check overlap on the appropriate side
# (forward when delta is odd, reverse when delta is even).
defp bisect_d(
_a,
defp bisect_d(_a, _b, a_lo, b_lo, n, _m, max_d, _v, _delta, _front?, _v1, _v2, d, _, _, _, _)
_b,
a_lo,
b_lo,
n,
_m,
max_d,
_v_off,
_delta,
_front?,
_v1,
_v2,
d,
_k1s,
_k1e,
_k2s,
_k2e
)
when d >= max_d do
# No middle snake found within max_d iterations — happens when D >= 2 but
# max_d is too small (e.g. tiny inputs like n=m=1 with no match) or no
# commonality at all. Fall back to "split at the top-right corner": left
# half becomes (full a, empty b) → all deletes, right half becomes
# (empty a, full b) → all inserts. Both halves are STRICTLY smaller, so
# No middle snake found within max_d iterations — tiny inputs (e.g.
# n=m=1 with no match) or no commonality at all. Fall back to
# "split at top-right": left half = (full a, empty b) → all deletes,
# right half = (empty a, full b) → all inserts. Both strictly smaller.
# the recursion terminates.
{a_lo + n, b_lo}
end
@@ -221,7 +196,7 @@
{:found, x, y} ->
{a_lo + x, b_lo + y}
{:cont, k1s_new, k1e_new} ->
{:cont, v1_new, k1s_new, k1e_new} ->
case reverse_sweep(
a,
b,
@@ -232,7 +207,7 @@
v_off,
delta,
front?,
v1_new,
v1,
v2,
d,
-d + k2s,
@@ -242,7 +217,7 @@
{:found, x, y} ->
{a_lo + x, b_lo + y}
{:cont, v2_new, k2s_new, k2e_new} ->
{:cont, k2s_new, k2e_new} ->
bisect_d(
a,
b,
@@ -254,8 +229,8 @@
v_off,
delta,
front?,
v1_new,
v2_new,
v1,
v2,
d + 1,
k1s_new,
k1e_new,
@@ -266,9 +241,9 @@
end
end
# Forward sweep over diagonals k = -d+k1s, -d+k1s+2, ..., d-k1e.
# Mutates v1 in place. Returns {:found, x, y} on overlap (when delta is
# odd) or {:cont, new_k1s, new_k1e} otherwise.
# Forward sweep: walk diagonals k = -d+k1s, -d+k1s+2, ..., d-k1e.
# Returns {:found, x, y} on overlap (when delta is odd), else
# {:cont, v1, k1s, k1e} with possibly-adjusted bounds.
defp forward_sweep(
_a,
_b,
@@ -279,7 +254,7 @@
_v_off,
_delta,
_front?,
v1,
_v1,
_v2,
d,
k1,
@@ -287,35 +262,33 @@
k1e
)
when k1 > d - k1e do
{:cont, v1, k1s, k1e}
{:cont, k1s, k1e}
end
defp forward_sweep(a, b, a_lo, b_lo, n, m, v_off, delta, front?, v1, v2, d, k1, k1s, k1e) do
k1_off = v_off + k1
# Pick predecessor — down (k+1) preferred over right (k-1) when tied.
x1 =
cond do
k1 == -d ->
max(vget(v1, k1_off + 1), 0)
Map.get(v1, k1_off + 1, -1) |> max(0)
k1 == d ->
Map.get(v1, k1_off - 1, -1) + 1
vget(v1, k1_off - 1) + 1
vget(v1, k1_off - 1) < vget(v1, k1_off + 1) ->
vget(v1, k1_off + 1)
Map.get(v1, k1_off - 1, -1) < Map.get(v1, k1_off + 1, -1) ->
Map.get(v1, k1_off + 1, -1)
true ->
vget(v1, k1_off - 1) + 1
Map.get(v1, k1_off - 1, -1) + 1
end
y1 = x1 - k1
{x1, y1} = snake_forward(a, b, a_lo, b_lo, n, m, x1, y1)
v1 = Map.put(v1, k1_off, x1)
vput(v1, k1_off, x1)
cond do
x1 > n ->
# Ran off the right edge — stop extending this side, shrink k range.
forward_sweep(
a,
b,
@@ -335,7 +308,6 @@
)
y1 > m ->
# Ran off the bottom — shrink the OTHER end of the k range.
forward_sweep(
a,
b,
@@ -355,37 +327,13 @@
)
front? ->
# Overlap check on forward sweep (delta is odd).
k2_off = v_off + delta - k1
if k2_off >= 0 and k2_off <= 2 * v_off do
v2_x = vget(v2, k2_off)
if k2_off >= 0 and k2_off < 2 * v_off + 1 do
v2_x = Map.get(v2, k2_off, -1)
if v2_x != -1 and x1 >= n - v2_x do
{:found, x1, y1}
if v2_x != -1 do
# Reverse coordinate → forward: x2_forward = n - v2_x
x2_fwd = n - v2_x
if x1 >= x2_fwd do
{:found, x1, y1}
else
forward_sweep(
a,
b,
a_lo,
b_lo,
n,
m,
v_off,
delta,
front?,
v1,
v2,
d,
k1 + 2,
k1s,
k1e
)
end
else
forward_sweep(
a,
@@ -414,8 +362,8 @@
end
end
# Reverse sweep: same shape, mirrored. Snake compares from the END going
# Reverse sweep — mirror of forward_sweep. Snake compares from the END
# going backward.
# backward.
defp reverse_sweep(
_a,
_b,
@@ -427,14 +375,14 @@
_delta,
_front?,
_v1,
_v2,
v2,
d,
k2,
k2s,
k2e
)
when k2 > d - k2e do
{:cont, v2, k2s, k2e}
{:cont, k2s, k2e}
end
defp reverse_sweep(a, b, a_lo, b_lo, n, m, v_off, delta, front?, v1, v2, d, k2, k2s, k2e) do
@@ -443,21 +391,21 @@
x2 =
cond do
k2 == -d ->
Map.get(v2, k2_off + 1, -1) |> max(0)
max(vget(v2, k2_off + 1), 0)
k2 == d ->
Map.get(v2, k2_off - 1, -1) + 1
vget(v2, k2_off - 1) + 1
Map.get(v2, k2_off - 1, -1) < Map.get(v2, k2_off + 1, -1) ->
Map.get(v2, k2_off + 1, -1)
vget(v2, k2_off - 1) < vget(v2, k2_off + 1) ->
vget(v2, k2_off + 1)
true ->
vget(v2, k2_off - 1) + 1
Map.get(v2, k2_off - 1, -1) + 1
end
y2 = x2 - k2
{x2, y2} = snake_reverse(a, b, a_lo, b_lo, n, m, x2, y2)
vput(v2, k2_off, x2)
v2 = Map.put(v2, k2_off, x2)
cond do
x2 > n ->
@@ -499,39 +447,15 @@
)
not front? ->
# Overlap check on reverse sweep (delta is even).
k1_off = v_off + delta - k2
if k1_off >= 0 and k1_off < 2 * v_off + 1 do
v1_x = Map.get(v1, k1_off, -1)
if k1_off >= 0 and k1_off <= 2 * v_off do
v1_x = vget(v1, k1_off)
if v1_x != -1 do
if v1_x != -1 and v1_x >= n - x2 do
x1_fwd = v1_x
y1_fwd = x1_fwd - (k1_off - v_off)
x2_fwd = n - x2
if x1_fwd >= x2_fwd do
# Use forward coordinates of the meeting point.
y1_fwd = x1_fwd - (k1_off - v_off)
{:found, x1_fwd, y1_fwd}
{:found, x1_fwd, y1_fwd}
else
reverse_sweep(
a,
b,
a_lo,
b_lo,
n,
m,
v_off,
delta,
front?,
v1,
v2,
d,
k2 + 2,
k2s,
k2e
)
end
else
reverse_sweep(
a,
@@ -561,9 +485,6 @@
end
# ── Snakes ────────────────────────────────────────────────────────────
#
# snake_forward: extend (x, y) forward (in local coordinates within the
# bisect range) as long as a[a_lo + x] == b[b_lo + y].
defp snake_forward(a, b, a_lo, b_lo, n, m, x, y) when x < n and y < m do
if elem(a, a_lo + x) == elem(b, b_lo + y) do
@@ -574,9 +495,6 @@
end
defp snake_forward(_a, _b, _a_lo, _b_lo, _n, _m, x, y), do: {x, y}
# snake_reverse: extend (x, y) backward — local x, y are STEPS BACK from
# the (n, m) corner. Compare a[a_lo + n - 1 - x] vs b[b_lo + m - 1 - y].
defp snake_reverse(a, b, a_lo, b_lo, n, m, x, y) when x < n and y < m do
ai = a_lo + n - 1 - x
@@ -590,4 +508,21 @@
end
defp snake_reverse(_a, _b, _a_lo, _b_lo, _n, _m, x, y), do: {x, y}
# ── :atomics helpers ──────────────────────────────────────────────────
#
# atomics is 1-indexed; we add 1 to convert from k_offset (0-indexed) to
# atomics index. signed:true is the default but stated explicitly above.
defp vget(v, k_off), do: :atomics.get(v, k_off + 1)
defp vput(v, k_off, x), do: :atomics.put(v, k_off + 1, x)
defp fill_neg1(_v, 0), do: :ok
defp fill_neg1(v, i),
do:
(
:atomics.put(v, i, -1)
fill_neg1(v, i - 1)
)
end