fangorn/ex_git_objectstore
public
ref:main
feat: rename/copy detection in diffs (similarity-based) #41
open
Opened by cole.christensen@gmail.com
Links
No links yet.
Diff.diff_trees/3 currently reports a rename as delete+add, which makes PR diffs unreadable. Need similarity-based rename and copy detection.
Scope
- Content-similarity score between blobs (shingle/rolling-hash or line-based Jaccard — whichever matches git’s
-Moutput). - Threshold configurable, default 50% like git.
Diff.diff_trees/3returns entries tagged{:renamed, old_path, new_path, similarity}/{:copied, ...}in addition to{:added, ...}/{:deleted, ...}.- Avoids O(N²) by bucketing by size first.
Acceptance
- Matches
git diff -Moutput on a fixture with one renamed file, one modified-and-renamed file, one copied file. - Performance: a 1k-file tree pair runs in under 500ms.
Blocks: rename-aware log history (separate ticket).