fangorn/ex_git_objectstore
public
feat: has_objects/2 existence check without object read #23
Links
-
🔒 private issue
Problem
Anvil’s clone/fork import path at `anvil/lib/anvil/code_review.ex:1204-1330` calls `put_object` for each loose object with no way to check if it already exists. The S3 backend must `GET` then `PUT` for every object regardless of whether it’s already in the destination. Clones and forks re-transfer every object on every import.
There is currently no cheap existence probe in the library.
Proposal
Add:
```elixir @spec has_object?(repo :: Repo.t(), sha :: binary) :: boolean @spec has_objects(repo :: Repo.t(), [binary]) :: %{binary => boolean} ```
Neither variant deserializes the object. On S3, `has_object?` uses `HEAD` instead of `GET`. On the filesystem, it uses `File.exists?/1` on the loose-object path plus a pack-index lookup.
Implementation notes
- Add `has_object?/2` to the Storage behaviour with a default implementation that calls `read_object` and pattern-matches on `{:ok, _}` — slow but always correct.
- Override in FS, S3, Memory backends with cheap versions.
- Consider `has_objects/2` returning a `MapSet` rather than a map for ergonomic `MapSet.member?/2` checks downstream.
- Pack index already tracks every object SHA; use that for O(1) checks on packed objects.
Acceptance
- Fork/clone incremental imports can skip objects the destination already has.
- S3 backend never issues a `GET` for existence checks.
- Benchmarked: re-importing 10k objects on a destination that already has them drops from ~15min (S3) to a few seconds.