fangorn/ex_git_objectstore
public
ref:6a8dd6440b1b6f37a15d907b2291303a0ccf0ec6
feat: blob_sizes/3 for batched size lookups (#22)
Closes #22
Adds a bulk variant of `blob_size/2` that runs the lookups in parallel
with bounded concurrency. Designed for consumers (Anvil's directory
listing renderer, primarily) that currently call `blob_size/2` inside an
`Enum.map` — on an S3-backed store each call is a separate round-trip,
so a 1000-file directory takes ~100s at 100 ms/call. `blob_sizes/3`
issues all lookups via `Task.async_stream` with `max_concurrency: 16`
(configurable), dropping the same workload to a few seconds.
Semantics:
* `blob_sizes(repo, [])` short-circuits to `{:ok, %{}}`
* Input shas are deduplicated before dispatch
* Returns `{:ok, %{sha => size}}` for every sha that resolved
* Shas that fail to resolve (missing, `:not_a_blob`, storage error,
task timeout) are silently dropped from the result map
* `:max_concurrency` and `:timeout` opts allow per-call tuning
Note this does not yet reduce the per-blob read cost — each sha still
reads full blob content via the existing `blob/2` path. The win here
is purely parallelism. A follow-up could add Storage-level bulk size
callbacks that use S3 HEAD + pack header parsing to skip content
transfer entirely.
Tests cover: happy path, missing shas silently dropped, dedup, empty
input, max_concurrency: 1 smoke test, and a non-blob (tree) sha being
filtered out. 6 new test cases; full suite 566 tests, 0 failures;
dialyzer clean.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SHA:
6a8dd6440b1b6f37a15d907b2291303a0ccf0ec6
Author:
Cole Christensen <cole.christensen@macmillan.com>
Date:
2026-04-13 16:51
Parents:
66017f9
3 files changed
+131
-0
| Type | ||
|---|---|---|
|
|
CHANGELOG.md | +4 −0 |
|
||