ref:main

feat: rename/copy detection in diffs (similarity-based) #41

open Opened by cole.christensen@gmail.com

Links

No links yet.

Diff.diff_trees/3 currently reports a rename as delete+add, which makes PR diffs unreadable. Need similarity-based rename and copy detection.

Scope

  • Content-similarity score between blobs (shingle/rolling-hash or line-based Jaccard — whichever matches git’s -M output).
  • Threshold configurable, default 50% like git.
  • Diff.diff_trees/3 returns entries tagged {:renamed, old_path, new_path, similarity} / {:copied, ...} in addition to {:added, ...} / {:deleted, ...}.
  • Avoids O(N²) by bucketing by size first.

Acceptance

  • Matches git diff -M output on a fixture with one renamed file, one modified-and-renamed file, one copied file.
  • Performance: a 1k-file tree pair runs in under 500ms.

Blocks: rename-aware log history (separate ticket).