Parallelize S3 list_refs GETs and put_pack uploads #13

merged colechristensen cole.christensen@gmail.com wants to merge feat/s3-parallel-list-refs-put-pack into main
No CI

Summary

  • S3.list_refs/3 now issues per-ref GETs concurrently via Task.async_stream (default max_concurrency: 32)
  • S3.put_pack/5 uploads .pack and .idx concurrently via Task.async + Task.await_many
  • Both wrapped in :telemetry.span/3 under the new [:ex_git_objectstore, :storage, _] namespace so the improvement is observable in production
  • Tuning knobs (list_refs_concurrency, list_refs_timeout, put_pack_timeout) live in the S3 config map — no Anvil-side changes required, defaults are sensible

Closes #25

Impact

  • Protocol advertisement on ref-heavy repos: ~25s → ~1s. A repo with 50 branches + 200 tags used to trigger 250 sequential 100ms GETs on every clone/fetch/push. Now parallel, bounded by max_concurrency: 32.
  • Pack upload latency halved on every push that produces a pack.

Partial-write semantics (put_pack)

If one of the two concurrent PUTs succeeds and the other fails, the successful object is left in place and the function returns {:error, reason}. A .pack without a matching .idx is unreachable through any lookup path, so GC/fsck reclaims it. Retries with the same pack_sha overwrite the orphan. Documented in the moduledoc alongside the existing CAS note.

Telemetry

Two new events (S3 backend only for now; Filesystem/Memory may adopt in a follow-up):

Event Measurements (stop) Metadata
[:ex_git_objectstore, :storage, :list_refs, _] :duration, :ref_count :ref_prefix, :backend
[:ex_git_objectstore, :storage, :put_pack, _] :duration, :pack_size, :idx_size :pack_sha, :backend

Full event list is documented in the ExGitObjectstore.Telemetry moduledoc.

Test plan

  • Existing 1001-ref pagination test extended with sort assertion
  • Low-concurrency smoke test (list_refs_concurrency: 1) verifies the config path
  • Concurrent put_pack round-trip with 128KB pack + 32KB idx
  • Error propagation test using a nonexistent bucket
  • Telemetry assertions for both events (backend metadata, size measurements, ref_count)
  • 595 total tests pass (566 non-S3 + 29 S3), mix dialyzer clean, mix format --check-formatted clean

Out of scope (follow-ups)

  • Filesystem/Memory backend telemetry (uniform coverage)
  • Retry/backoff for transient S3 errors
  • Range-based pack reads (NOTE(C7) in object_resolver.ex)
  • hackney connection pool tuning docs
Created Apr 17, 2026 at 18:21 UTC | Merged Apr 17, 2026 at 18:36 UTC by colechristensen cole.christensen@gmail.com