fangorn/ex_git_objectstore
public
ref:main
feat: bulk blob_sizes/2 for batched tree-entry size lookups #22
closed
Opened by cole.christensen@gmail.com
Links
Blocks
-
🔒 private issue
Problem
Anvil’s tree-rendering path calls `ExGitObjectstore.blob_size/2` once per entry inside an `Enum.map` (see `anvil/lib/anvil/git/objectstore.ex:522-536`). A 1000-file directory = 1000 individual calls. On the S3 storage backend each call is a separate network round-trip (100+ ms), so a single directory listing can take dozens of seconds.
Proposal
Add a batched API:
```elixir @spec blob_sizes(repo :: Repo.t(), [sha :: binary]) :: {:ok, %{binary => non_neg_integer}} | {:error, term} def blob_sizes(repo, shas) ```
Returns a map of `sha => size` for every input sha. Unknown shas are simply omitted from the map.
Implementation notes
- Deduplicate input shas internally before touching storage.
- Filesystem backend: loop on top of the existing single-object read path — cheap.
- S3 backend: use `S3.BatchGetObjectAttributes` if available, otherwise issue parallel `HeadObject` requests with a bounded concurrency pool (e.g. `Task.async_stream` with `max_concurrency: 16`).
- Memory backend: trivial `Map.take`.
- Pack-backed objects: read sizes from the pack index without inflating the blob.
Out of scope
- Typed bulk reads (`read_commits/2`, `read_trees/2`) — separate issue.
- Prefetch/warmup for future reads — separate issue.
Acceptance
- `blob_sizes/2` exists, documented, typed.
- Storage behaviour gains a `bulk_sizes/2` callback with a generic fallback so backends don’t have to implement it right away.
- Benchmarked: 1000 shas on S3 goes from ~100s to sub-second.