ref:6a8dd6440b1b6f37a15d907b2291303a0ccf0ec6

feat: blob_sizes/3 for batched size lookups (#22)

Closes #22 Adds a bulk variant of `blob_size/2` that runs the lookups in parallel with bounded concurrency. Designed for consumers (Anvil's directory listing renderer, primarily) that currently call `blob_size/2` inside an `Enum.map` — on an S3-backed store each call is a separate round-trip, so a 1000-file directory takes ~100s at 100 ms/call. `blob_sizes/3` issues all lookups via `Task.async_stream` with `max_concurrency: 16` (configurable), dropping the same workload to a few seconds. Semantics: * `blob_sizes(repo, [])` short-circuits to `{:ok, %{}}` * Input shas are deduplicated before dispatch * Returns `{:ok, %{sha => size}}` for every sha that resolved * Shas that fail to resolve (missing, `:not_a_blob`, storage error, task timeout) are silently dropped from the result map * `:max_concurrency` and `:timeout` opts allow per-call tuning Note this does not yet reduce the per-blob read cost — each sha still reads full blob content via the existing `blob/2` path. The win here is purely parallelism. A follow-up could add Storage-level bulk size callbacks that use S3 HEAD + pack header parsing to skip content transfer entirely. Tests cover: happy path, missing shas silently dropped, dedup, empty input, max_concurrency: 1 smoke test, and a non-blob (tree) sha being filtered out. 6 new test cases; full suite 566 tests, 0 failures; dialyzer clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SHA: 6a8dd6440b1b6f37a15d907b2291303a0ccf0ec6
Author: Cole Christensen <cole.christensen@macmillan.com>
Date: 2026-04-13 16:51
Parents: 66017f9
3 files changed +131 -0
Type
CHANGELOG.md +4 −0
@@ -9,6 +9,10 @@
### Added
- `blob_sizes/3` — batched variant of `blob_size/2` with bounded-concurrency
parallel reads, deduplication, and `{:ok, %{sha => size}}` return. Drops
the 100s-of-sequential-round-trips cost of rendering large directory
listings on S3-backed storage. See fangorn/ex_git_objectstore#22.
- Repository integrity verification (`Fsck.check/2`) with full and quick modes
- `list_objects/2` callback on Storage behaviour for enumerating loose objects
- Dialyzer enforced in CI pipeline