Commit 469c525 - fangorn/ex_git_objectstore


      fangorn/ex_git_objectstore

public

ref:469c525b9be8e16c0427d548900b70c2ef438c17

perf(diff): swap Myers V table from Map to :atomics

After the linear-space algorithm landed, Phase A on chiron PR #68 took 20s wall-clock — too slow for a page mount. 21k garbage collections during that 20s, almost all from `Map.put` churn in the inner loop. `:atomics` is the right primitive for V here: fixed-size array of signed 64-bit ints, mutable in place, lives off the BEAM term heap, no GC pressure. The diagonal index k maps cleanly onto a 1-indexed slot. Switched all V access from Map.get/put to :atomics.get/put. Sentinel -1 ("not yet reached") still works since :atomics defaults to signed. The arrays are explicitly initialized to -1 at bisect entry (atomics default to 0, which collides with the legitimate "reached at x=0" state). Per-bisect init is O(N+M) — same magnitude as the algorithm's own work, no asymptotic change. Mutability simplifies the recursion too: forward_sweep and reverse_sweep no longer thread updated v1/v2 through their return values. They mutate in place and return only the bounds. Bench (10k lines, ~33% diff): wall 12.2s → 4.7s (2.6× faster) peak heap 9.1MB → 4.5MB (2× lower) GCs ~21k → ~1k (~20× fewer) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

SHA: 469c525b9be8e16c0427d548900b70c2ef438c17

Author: Cole Christensen <cole.christensen@macmillan.com>

Date: 2026-04-29 04:57

Parents: dc2d28f

1 files changed +82 -147

Type

Type
	lib/ex_git_objectstore/diff/myers.ex	+82 −147
@@ -24,24 +24,19 @@ Memory: O(N+M) total. Each recursion frame holds two V tables of size ~max_d (= (n+m+1)/2 in that frame) for the duration of one bisect; once bisect returns, the V tables are GC'd before recursing. Recursion depth is O(log(N+M)) on average. bisect returns, the V tables are freed. Recursion depth is O(log(N+M)) on average. V tables are `:atomics` arrays (mutable in place, off the BEAM term heap, O(1) get/put, no GC pressure). The original implementation used `Map` for V; correctness was identical but the Map.put churn drove GC count to ~1000/sec on real inputs and stretched a 200-millisecond computation into This module is a careful Elixir translation of Google diff_match_patch's `diff_bisect` (https://github.com/google/diff-match-patch), which is itself a faithful port of Myers §4b. The C reference is git xdiff's a 20-second one. `xdl_split` / `xdl_recs_cmp` (https://github.com/git/git/blob/master/xdiff/xdiffi.c). Translation reference: Google diff_match_patch's `diff_bisect` (https://github.com/google/diff-match-patch), itself a faithful port of ## Why we needed this The previous hand-rolled implementation kept every V-table from every `d` iteration in a list (O(D²) memory) and used a `Map` for V (O(log n) per access). On large inputs the BEAM allocator climbed into the GBs and cgroup-killed under load. Stdlib's `List.myers_difference/2` is also naive Myers (one path-with-suffix per diagonal kept alive simultaneously — O(D × max(D, N+M)) memory) and exhibits the same problem. This module is the proper linear-space variant, the same algorithm git uses by Myers §4b. Cross-checked against git xdiff's `xdl_split` (https://github.com/git/git/blob/master/xdiff/xdiffi.c). default in its `xdiff` library. """ @type edit :: {:eq, term()} \| {:ins, term()} \| {:del, term()} @@ -123,61 +118,41 @@ # ── Bisect — find the middle-snake split point ──────────────────────── # # Translation of diff_match_patch's diff_bisect (Python). Returns # `{x, y}` in ABSOLUTE coordinates (within the original a/b) such that # the optimal edit script from `a[a_lo..a_hi)` to `b[b_lo..b_hi)` passes # through (x, y) in the middle of the edit graph. # # V tables are stored as Maps keyed by k_offset = v_offset + k. Sentinel # V tables are `:atomics` arrays of signed 64-bit integers. Index range # [1, 2max_d + 1] (atomics is 1-indexed). The diagonal-to-index mapping # is `index = k_offset + 1` where `k_offset = v_offset + k`. Sentinel # `-1` means "not yet reached" — distinct from "reached at x=0". # `-1` means "not yet reached" (distinct from "reached at x=0"). Map is # used for clarity; could be swapped for `:atomics` for speed once # correctness is established. defp bisect(a, b, a_lo, a_hi, b_lo, b_hi) do n = a_hi - a_lo m = b_hi - b_lo max_d = div(n + m + 1, 2) v_offset = max_d size = 2 max_d + 1 v1 = :atomics.new(size, signed: true) v2 = :atomics.new(size, signed: true) fill_neg1(v1, size) # Init: V_f[1] = 0 means "forward path on diagonal 1 is at x=0 (start)". # Same for V_r. All other entries default to -1 (not reached). v1 = %{(v_offset + 1) => 0} fill_neg1(v2, size) v2 = %{(v_offset + 1) => 0} # Myers init: V[1] = 0 vput(v1, v_offset + 1, 0) vput(v2, v_offset + 1, 0) delta = n - m front? = rem(delta, 2) != 0 bisect_d(a, b, a_lo, b_lo, n, m, max_d, v_offset, delta, front?, v1, v2, 0, 0, 0, 0, 0) end # bisect_d/17: search depth d. For each d, walk the forward path one step # bisect_d: search depth d. For each d, walk the forward path one step, # then the reverse path one step. Check overlap on the appropriate side # (forward when delta is odd, reverse when delta is even). defp bisect_d( _a, defp bisect_d(_a, _b, a_lo, b_lo, n, _m, max_d, _v, _delta, _front?, _v1, _v2, d, _, _, _, _) _b, a_lo, b_lo, n, _m, max_d, _v_off, _delta, _front?, _v1, _v2, d, _k1s, _k1e, _k2s, _k2e ) when d >= max_d do # No middle snake found within max_d iterations — happens when D >= 2 but # max_d is too small (e.g. tiny inputs like n=m=1 with no match) or no # commonality at all. Fall back to "split at the top-right corner": left # half becomes (full a, empty b) → all deletes, right half becomes # (empty a, full b) → all inserts. Both halves are STRICTLY smaller, so # No middle snake found within max_d iterations — tiny inputs (e.g. # n=m=1 with no match) or no commonality at all. Fall back to # "split at top-right": left half = (full a, empty b) → all deletes, # right half = (empty a, full b) → all inserts. Both strictly smaller. # the recursion terminates. {a_lo + n, b_lo} end @@ -221,7 +196,7 @@ {:found, x, y} -> {a_lo + x, b_lo + y} {:cont, k1s_new, k1e_new} -> {:cont, v1_new, k1s_new, k1e_new} -> case reverse_sweep( a, b, @@ -232,7 +207,7 @@ v_off, delta, front?, v1_new, v1, v2, d, -d + k2s, @@ -242,7 +217,7 @@ {:found, x, y} -> {a_lo + x, b_lo + y} {:cont, v2_new, k2s_new, k2e_new} -> {:cont, k2s_new, k2e_new} -> bisect_d( a, b, @@ -254,8 +229,8 @@ v_off, delta, front?, v1_new, v2_new, v1, v2, d + 1, k1s_new, k1e_new, @@ -266,9 +241,9 @@ end end # Forward sweep over diagonals k = -d+k1s, -d+k1s+2, ..., d-k1e. # Mutates v1 in place. Returns {:found, x, y} on overlap (when delta is # odd) or {:cont, new_k1s, new_k1e} otherwise. # Forward sweep: walk diagonals k = -d+k1s, -d+k1s+2, ..., d-k1e. # Returns {:found, x, y} on overlap (when delta is odd), else # {:cont, v1, k1s, k1e} with possibly-adjusted bounds. defp forward_sweep( _a, _b, @@ -279,7 +254,7 @@ _v_off, _delta, _front?, v1, _v1, _v2, d, k1, @@ -287,35 +262,33 @@ k1e ) when k1 > d - k1e do {:cont, v1, k1s, k1e} {:cont, k1s, k1e} end defp forward_sweep(a, b, a_lo, b_lo, n, m, v_off, delta, front?, v1, v2, d, k1, k1s, k1e) do k1_off = v_off + k1 # Pick predecessor — down (k+1) preferred over right (k-1) when tied. x1 = cond do k1 == -d -> max(vget(v1, k1_off + 1), 0) Map.get(v1, k1_off + 1, -1) \|> max(0) k1 == d -> Map.get(v1, k1_off - 1, -1) + 1 vget(v1, k1_off - 1) + 1 vget(v1, k1_off - 1) < vget(v1, k1_off + 1) -> vget(v1, k1_off + 1) Map.get(v1, k1_off - 1, -1) < Map.get(v1, k1_off + 1, -1) -> Map.get(v1, k1_off + 1, -1) true -> vget(v1, k1_off - 1) + 1 Map.get(v1, k1_off - 1, -1) + 1 end y1 = x1 - k1 {x1, y1} = snake_forward(a, b, a_lo, b_lo, n, m, x1, y1) v1 = Map.put(v1, k1_off, x1) vput(v1, k1_off, x1) cond do x1 > n -> # Ran off the right edge — stop extending this side, shrink k range. forward_sweep( a, b, @@ -335,7 +308,6 @@ ) y1 > m -> # Ran off the bottom — shrink the OTHER end of the k range. forward_sweep( a, b, @@ -355,37 +327,13 @@ ) front? -> # Overlap check on forward sweep (delta is odd). k2_off = v_off + delta - k1 if k2_off >= 0 and k2_off <= 2 * v_off do v2_x = vget(v2, k2_off) if k2_off >= 0 and k2_off < 2 * v_off + 1 do v2_x = Map.get(v2, k2_off, -1) if v2_x != -1 and x1 >= n - v2_x do {:found, x1, y1} if v2_x != -1 do # Reverse coordinate → forward: x2_forward = n - v2_x x2_fwd = n - v2_x if x1 >= x2_fwd do {:found, x1, y1} else forward_sweep( a, b, a_lo, b_lo, n, m, v_off, delta, front?, v1, v2, d, k1 + 2, k1s, k1e ) end else forward_sweep( a, @@ -414,8 +362,8 @@ end end # Reverse sweep: same shape, mirrored. Snake compares from the END going # Reverse sweep — mirror of forward_sweep. Snake compares from the END # going backward. # backward. defp reverse_sweep( _a, _b, @@ -427,14 +375,14 @@ _delta, _front?, _v1, _v2, v2, d, k2, k2s, k2e ) when k2 > d - k2e do {:cont, v2, k2s, k2e} {:cont, k2s, k2e} end defp reverse_sweep(a, b, a_lo, b_lo, n, m, v_off, delta, front?, v1, v2, d, k2, k2s, k2e) do @@ -443,21 +391,21 @@ x2 = cond do k2 == -d -> Map.get(v2, k2_off + 1, -1) \|> max(0) max(vget(v2, k2_off + 1), 0) k2 == d -> Map.get(v2, k2_off - 1, -1) + 1 vget(v2, k2_off - 1) + 1 Map.get(v2, k2_off - 1, -1) < Map.get(v2, k2_off + 1, -1) -> Map.get(v2, k2_off + 1, -1) vget(v2, k2_off - 1) < vget(v2, k2_off + 1) -> vget(v2, k2_off + 1) true -> vget(v2, k2_off - 1) + 1 Map.get(v2, k2_off - 1, -1) + 1 end y2 = x2 - k2 {x2, y2} = snake_reverse(a, b, a_lo, b_lo, n, m, x2, y2) vput(v2, k2_off, x2) v2 = Map.put(v2, k2_off, x2) cond do x2 > n -> @@ -499,39 +447,15 @@ ) not front? -> # Overlap check on reverse sweep (delta is even). k1_off = v_off + delta - k2 if k1_off >= 0 and k1_off < 2 * v_off + 1 do v1_x = Map.get(v1, k1_off, -1) if k1_off >= 0 and k1_off <= 2 * v_off do v1_x = vget(v1, k1_off) if v1_x != -1 do if v1_x != -1 and v1_x >= n - x2 do x1_fwd = v1_x y1_fwd = x1_fwd - (k1_off - v_off) x2_fwd = n - x2 if x1_fwd >= x2_fwd do # Use forward coordinates of the meeting point. y1_fwd = x1_fwd - (k1_off - v_off) {:found, x1_fwd, y1_fwd} {:found, x1_fwd, y1_fwd} else reverse_sweep( a, b, a_lo, b_lo, n, m, v_off, delta, front?, v1, v2, d, k2 + 2, k2s, k2e ) end else reverse_sweep( a, @@ -561,9 +485,6 @@ end # ── Snakes ──────────────────────────────────────────────────────────── # # snake_forward: extend (x, y) forward (in local coordinates within the # bisect range) as long as a[a_lo + x] == b[b_lo + y]. defp snake_forward(a, b, a_lo, b_lo, n, m, x, y) when x < n and y < m do if elem(a, a_lo + x) == elem(b, b_lo + y) do @@ -574,9 +495,6 @@ end defp snake_forward(_a, _b, _a_lo, _b_lo, _n, _m, x, y), do: {x, y} # snake_reverse: extend (x, y) backward — local x, y are STEPS BACK from # the (n, m) corner. Compare a[a_lo + n - 1 - x] vs b[b_lo + m - 1 - y]. defp snake_reverse(a, b, a_lo, b_lo, n, m, x, y) when x < n and y < m do ai = a_lo + n - 1 - x @@ -590,4 +508,21 @@ end defp snake_reverse(_a, _b, _a_lo, _b_lo, _n, _m, x, y), do: {x, y} # ── :atomics helpers ────────────────────────────────────────────────── # # atomics is 1-indexed; we add 1 to convert from k_offset (0-indexed) to # atomics index. signed:true is the default but stated explicitly above. defp vget(v, k_off), do: :atomics.get(v, k_off + 1) defp vput(v, k_off, x), do: :atomics.put(v, k_off + 1, x) defp fill_neg1(_v, 0), do: :ok defp fill_neg1(v, i), do: ( :atomics.put(v, i, -1) fill_neg1(v, i - 1) ) end

lib/ex_git_objectstore/diff/myers.ex

+82 −147

@@ -24,24 +24,19 @@
  Memory: O(N+M) total. Each recursion frame holds two V tables of size
  ~max_d (= (n+m+1)/2 in that frame) for the duration of one bisect; once
  bisect returns, the V tables are GC'd before recursing. Recursion depth
  is O(log(N+M)) on average.
  bisect returns, the V tables are freed. Recursion depth is O(log(N+M))
  on average.
  V tables are `:atomics` arrays (mutable in place, off the BEAM term heap,
  O(1) get/put, no GC pressure). The original implementation used `Map` for
  V; correctness was identical but the Map.put churn drove GC count to
  ~1000/sec on real inputs and stretched a 200-millisecond computation into
  This module is a careful Elixir translation of Google diff_match_patch's
  `diff_bisect` (https://github.com/google/diff-match-patch), which is
  itself a faithful port of Myers §4b. The C reference is git xdiff's
  a 20-second one.
  `xdl_split` / `xdl_recs_cmp` (https://github.com/git/git/blob/master/xdiff/xdiffi.c).
  Translation reference: Google diff_match_patch's `diff_bisect`
  (https://github.com/google/diff-match-patch), itself a faithful port of
  ## Why we needed this
  The previous hand-rolled implementation kept every V-table from every `d`
  iteration in a list (O(D²) memory) and used a `Map` for V (O(log n) per
  access). On large inputs the BEAM allocator climbed into the GBs and
  cgroup-killed under load. Stdlib's `List.myers_difference/2` is also
  naive Myers (one path-with-suffix per diagonal kept alive simultaneously
  — O(D × max(D, N+M)) memory) and exhibits the same problem. This module
  is the proper linear-space variant, the same algorithm git uses by
  Myers §4b. Cross-checked against git xdiff's `xdl_split`
  (https://github.com/git/git/blob/master/xdiff/xdiffi.c).
  default in its `xdiff` library.
  """
  @type edit :: {:eq, term()} | {:ins, term()} | {:del, term()}
@@ -123,61 +118,41 @@
  # ── Bisect — find the middle-snake split point ────────────────────────
  #
  # Translation of diff_match_patch's diff_bisect (Python). Returns
  # `{x, y}` in ABSOLUTE coordinates (within the original a/b) such that
  # the optimal edit script from `a[a_lo..a_hi)` to `b[b_lo..b_hi)` passes
  # through (x, y) in the middle of the edit graph.
  #
  # V tables are stored as Maps keyed by k_offset = v_offset + k. Sentinel
  # V tables are `:atomics` arrays of signed 64-bit integers. Index range
  # [1, 2*max_d + 1] (atomics is 1-indexed). The diagonal-to-index mapping
  # is `index = k_offset + 1` where `k_offset = v_offset + k`. Sentinel
  # `-1` means "not yet reached" — distinct from "reached at x=0".
  # `-1` means "not yet reached" (distinct from "reached at x=0"). Map is
  # used for clarity; could be swapped for `:atomics` for speed once
  # correctness is established.
  defp bisect(a, b, a_lo, a_hi, b_lo, b_hi) do
    n = a_hi - a_lo
    m = b_hi - b_lo
    max_d = div(n + m + 1, 2)
    v_offset = max_d
    size = 2 * max_d + 1
    v1 = :atomics.new(size, signed: true)
    v2 = :atomics.new(size, signed: true)
    fill_neg1(v1, size)
    # Init: V_f[1] = 0 means "forward path on diagonal 1 is at x=0 (start)".
    # Same for V_r. All other entries default to -1 (not reached).
    v1 = %{(v_offset + 1) => 0}
    fill_neg1(v2, size)
    v2 = %{(v_offset + 1) => 0}
    # Myers init: V[1] = 0
    vput(v1, v_offset + 1, 0)
    vput(v2, v_offset + 1, 0)
    delta = n - m
    front? = rem(delta, 2) != 0
    bisect_d(a, b, a_lo, b_lo, n, m, max_d, v_offset, delta, front?, v1, v2, 0, 0, 0, 0, 0)
  end
  # bisect_d/17: search depth d. For each d, walk the forward path one step
  # bisect_d: search depth d. For each d, walk the forward path one step,
  # then the reverse path one step. Check overlap on the appropriate side
  # (forward when delta is odd, reverse when delta is even).
  defp bisect_d(
         _a,
  defp bisect_d(_a, _b, a_lo, b_lo, n, _m, max_d, _v, _delta, _front?, _v1, _v2, d, _, _, _, _)
         _b,
         a_lo,
         b_lo,
         n,
         _m,
         max_d,
         _v_off,
         _delta,
         _front?,
         _v1,
         _v2,
         d,
         _k1s,
         _k1e,
         _k2s,
         _k2e
       )
       when d >= max_d do
    # No middle snake found within max_d iterations — happens when D >= 2 but
    # max_d is too small (e.g. tiny inputs like n=m=1 with no match) or no
    # commonality at all. Fall back to "split at the top-right corner": left
    # half becomes (full a, empty b) → all deletes, right half becomes
    # (empty a, full b) → all inserts. Both halves are STRICTLY smaller, so
    # No middle snake found within max_d iterations — tiny inputs (e.g.
    # n=m=1 with no match) or no commonality at all. Fall back to
    # "split at top-right": left half = (full a, empty b) → all deletes,
    # right half = (empty a, full b) → all inserts. Both strictly smaller.
    # the recursion terminates.
    {a_lo + n, b_lo}
  end
@@ -221,7 +196,7 @@
      {:found, x, y} ->
        {a_lo + x, b_lo + y}
      {:cont, k1s_new, k1e_new} ->
      {:cont, v1_new, k1s_new, k1e_new} ->
        case reverse_sweep(
               a,
               b,
@@ -232,7 +207,7 @@
               v_off,
               delta,
               front?,
               v1_new,
               v1,
               v2,
               d,
               -d + k2s,
@@ -242,7 +217,7 @@
          {:found, x, y} ->
            {a_lo + x, b_lo + y}
          {:cont, v2_new, k2s_new, k2e_new} ->
          {:cont, k2s_new, k2e_new} ->
            bisect_d(
              a,
              b,
@@ -254,8 +229,8 @@
              v_off,
              delta,
              front?,
              v1_new,
              v2_new,
              v1,
              v2,
              d + 1,
              k1s_new,
              k1e_new,
@@ -266,9 +241,9 @@
    end
  end
  # Forward sweep over diagonals k = -d+k1s, -d+k1s+2, ..., d-k1e.
  # Mutates v1 in place. Returns {:found, x, y} on overlap (when delta is
  # odd) or {:cont, new_k1s, new_k1e} otherwise.
  # Forward sweep: walk diagonals k = -d+k1s, -d+k1s+2, ..., d-k1e.
  # Returns {:found, x, y} on overlap (when delta is odd), else
  # {:cont, v1, k1s, k1e} with possibly-adjusted bounds.
  defp forward_sweep(
         _a,
         _b,
@@ -279,7 +254,7 @@
         _v_off,
         _delta,
         _front?,
         v1,
         _v1,
         _v2,
         d,
         k1,
@@ -287,35 +262,33 @@
         k1e
       )
       when k1 > d - k1e do
    {:cont, v1, k1s, k1e}
    {:cont, k1s, k1e}
  end
  defp forward_sweep(a, b, a_lo, b_lo, n, m, v_off, delta, front?, v1, v2, d, k1, k1s, k1e) do
    k1_off = v_off + k1
    # Pick predecessor — down (k+1) preferred over right (k-1) when tied.
    x1 =
      cond do
        k1 == -d ->
          max(vget(v1, k1_off + 1), 0)
          Map.get(v1, k1_off + 1, -1) |> max(0)
        k1 == d ->
          Map.get(v1, k1_off - 1, -1) + 1
          vget(v1, k1_off - 1) + 1
        vget(v1, k1_off - 1) < vget(v1, k1_off + 1) ->
          vget(v1, k1_off + 1)
        Map.get(v1, k1_off - 1, -1) < Map.get(v1, k1_off + 1, -1) ->
          Map.get(v1, k1_off + 1, -1)
        true ->
          vget(v1, k1_off - 1) + 1
          Map.get(v1, k1_off - 1, -1) + 1
      end
    y1 = x1 - k1
    {x1, y1} = snake_forward(a, b, a_lo, b_lo, n, m, x1, y1)
    v1 = Map.put(v1, k1_off, x1)
    vput(v1, k1_off, x1)
    cond do
      x1 > n ->
        # Ran off the right edge — stop extending this side, shrink k range.
        forward_sweep(
          a,
          b,
@@ -335,7 +308,6 @@
        )
      y1 > m ->
        # Ran off the bottom — shrink the OTHER end of the k range.
        forward_sweep(
          a,
          b,
@@ -355,37 +327,13 @@
        )
      front? ->
        # Overlap check on forward sweep (delta is odd).
        k2_off = v_off + delta - k1
        if k2_off >= 0 and k2_off <= 2 * v_off do
          v2_x = vget(v2, k2_off)
        if k2_off >= 0 and k2_off < 2 * v_off + 1 do
          v2_x = Map.get(v2, k2_off, -1)
          if v2_x != -1 and x1 >= n - v2_x do
            {:found, x1, y1}
          if v2_x != -1 do
            # Reverse coordinate → forward: x2_forward = n - v2_x
            x2_fwd = n - v2_x
            if x1 >= x2_fwd do
              {:found, x1, y1}
            else
              forward_sweep(
                a,
                b,
                a_lo,
                b_lo,
                n,
                m,
                v_off,
                delta,
                front?,
                v1,
                v2,
                d,
                k1 + 2,
                k1s,
                k1e
              )
            end
          else
            forward_sweep(
              a,
@@ -414,8 +362,8 @@
    end
  end
  # Reverse sweep: same shape, mirrored. Snake compares from the END going
  # Reverse sweep — mirror of forward_sweep. Snake compares from the END
  # going backward.
  # backward.
  defp reverse_sweep(
         _a,
         _b,
@@ -427,14 +375,14 @@
         _delta,
         _front?,
         _v1,
         _v2,
         v2,
         d,
         k2,
         k2s,
         k2e
       )
       when k2 > d - k2e do
    {:cont, v2, k2s, k2e}
    {:cont, k2s, k2e}
  end
  defp reverse_sweep(a, b, a_lo, b_lo, n, m, v_off, delta, front?, v1, v2, d, k2, k2s, k2e) do
@@ -443,21 +391,21 @@
    x2 =
      cond do
        k2 == -d ->
          Map.get(v2, k2_off + 1, -1) |> max(0)
          max(vget(v2, k2_off + 1), 0)
        k2 == d ->
          Map.get(v2, k2_off - 1, -1) + 1
          vget(v2, k2_off - 1) + 1
        Map.get(v2, k2_off - 1, -1) < Map.get(v2, k2_off + 1, -1) ->
          Map.get(v2, k2_off + 1, -1)
        vget(v2, k2_off - 1) < vget(v2, k2_off + 1) ->
          vget(v2, k2_off + 1)
        true ->
          vget(v2, k2_off - 1) + 1
          Map.get(v2, k2_off - 1, -1) + 1
      end
    y2 = x2 - k2
    {x2, y2} = snake_reverse(a, b, a_lo, b_lo, n, m, x2, y2)
    vput(v2, k2_off, x2)
    v2 = Map.put(v2, k2_off, x2)
    cond do
      x2 > n ->
@@ -499,39 +447,15 @@
        )
      not front? ->
        # Overlap check on reverse sweep (delta is even).
        k1_off = v_off + delta - k2
        if k1_off >= 0 and k1_off < 2 * v_off + 1 do
          v1_x = Map.get(v1, k1_off, -1)
        if k1_off >= 0 and k1_off <= 2 * v_off do
          v1_x = vget(v1, k1_off)
          if v1_x != -1 do
          if v1_x != -1 and v1_x >= n - x2 do
            x1_fwd = v1_x
            y1_fwd = x1_fwd - (k1_off - v_off)
            x2_fwd = n - x2
            if x1_fwd >= x2_fwd do
              # Use forward coordinates of the meeting point.
              y1_fwd = x1_fwd - (k1_off - v_off)
              {:found, x1_fwd, y1_fwd}
            {:found, x1_fwd, y1_fwd}
            else
              reverse_sweep(
                a,
                b,
                a_lo,
                b_lo,
                n,
                m,
                v_off,
                delta,
                front?,
                v1,
                v2,
                d,
                k2 + 2,
                k2s,
                k2e
              )
            end
          else
            reverse_sweep(
              a,
@@ -561,9 +485,6 @@
  end
  # ── Snakes ────────────────────────────────────────────────────────────
  #
  # snake_forward: extend (x, y) forward (in local coordinates within the
  # bisect range) as long as a[a_lo + x] == b[b_lo + y].
  defp snake_forward(a, b, a_lo, b_lo, n, m, x, y) when x < n and y < m do
    if elem(a, a_lo + x) == elem(b, b_lo + y) do
@@ -574,9 +495,6 @@
  end
  defp snake_forward(_a, _b, _a_lo, _b_lo, _n, _m, x, y), do: {x, y}
  # snake_reverse: extend (x, y) backward — local x, y are STEPS BACK from
  # the (n, m) corner. Compare a[a_lo + n - 1 - x] vs b[b_lo + m - 1 - y].
  defp snake_reverse(a, b, a_lo, b_lo, n, m, x, y) when x < n and y < m do
    ai = a_lo + n - 1 - x
@@ -590,4 +508,21 @@
  end
  defp snake_reverse(_a, _b, _a_lo, _b_lo, _n, _m, x, y), do: {x, y}
  # ── :atomics helpers ──────────────────────────────────────────────────
  #
  # atomics is 1-indexed; we add 1 to convert from k_offset (0-indexed) to
  # atomics index. signed:true is the default but stated explicitly above.
  defp vget(v, k_off), do: :atomics.get(v, k_off + 1)
  defp vput(v, k_off, x), do: :atomics.put(v, k_off + 1, x)
  defp fill_neg1(_v, 0), do: :ok
  defp fill_neg1(v, i),
    do:
      (
        :atomics.put(v, i, -1)
        fill_neg1(v, i - 1)
      )
end