fangorn/ex_git_objectstore
public
ref:main
# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
### Added
- **Git LFS support (spec v1).** Full Large File Storage implementation
exposed as pure request/response modules, matching the existing
`UploadPack`/`ReceivePack` style (no HTTP server in-tree).
- `ExGitObjectstore.Lfs.Pointer` — parse and emit spec-compliant
pointer blobs with strict validation (version-first, alphabetical
key order, sha256-only OIDs, canonical decimal size, trailing LF).
- `ExGitObjectstore.Lfs.Store` — behaviour parallel to `Storage`,
keyed by SHA256. Streaming `put/4` verifies the observed SHA256
matches the claimed OID and discards the write on mismatch.
Shared conformance test suite at
`ExGitObjectstore.Test.LfsStoreConformance`.
- Backends: `Lfs.Store.Memory`, `Lfs.Store.Filesystem`,
`Lfs.Store.S3`. S3 uses multipart upload for streaming PUT and
exposes optional `presigned_upload/5` / `presigned_download/4`
callbacks for direct-to-S3 client transfers.
- `ExGitObjectstore.Lfs.Batch` — Batch API handler
(`POST /objects/batch`) returning spec-compliant upload / download /
verify actions. Uses presigned URLs when the backend supports them;
falls back to library-served URLs for Filesystem/Memory.
- `ExGitObjectstore.Lfs.Transfer` — basic-transfer handlers for
`GET/PUT /objects/:oid` and `POST /objects/:oid/verify`, with
streaming downloads and SHA256-verified uploads.
- `ExGitObjectstore.Lfs.Locks` — Locks API v1: create, list, verify,
unlock (with `force` for admin override). Lock metadata stored on
the repo's `Storage` backend under `lfs/locks/*.json`.
- Telemetry spans emitted at
`[:ex_git_objectstore, :lfs, :batch | :transfer | :lock]`.
- `Repo` gains optional `:lfs_storage` option alongside `:storage`.
- End-to-end interop coverage against the real `git lfs` binary via
a Bandit-backed test HTTP adapter. 11 scenarios exercise push,
smudge (download), idempotent re-push, edge-case payload sizes
(0-byte via direct HTTP, 1-byte via the CLI), multi-file batch
pushes, mixed-state batches (present + absent), concurrent
parallel uploads (10 files with `lfs.concurrenttransfers=8`),
direct-HTTP OID tampering (server must 422 and leave nothing on
disk), lock create/list/verify/unlock with conflict and 403-by
-non-owner paths, and a full end-to-end `git push` → `git clone`
→ `git lfs pull` roundtrip over smart-http. Found and fixed one
real bug: `Batch.handle/3` was double-prefixing `repos/<id>/lfs`
into action URLs — the `:base_url` is now the LFS root itself
and the module emits `<base_url>/objects/<oid>` and
`<base_url>/verify`. The test adapter also wires the existing
`UploadPack`/`ReceivePack` modules to the git smart-http v0
endpoints (`GET /info/refs`, `POST /git-upload-pack`,
`POST /git-receive-pack`) so a real `git clone` can complete
the full clone-then-lfs-pull flow.
- S3 backend interop coverage: 14 conformance tests against real
MinIO plus 2 end-to-end `git lfs push`/`smudge` scenarios that
exercise the presigned-URL path — client uploads directly to
MinIO via presigned PUTs and downloads via presigned GETs, with
the library only mediating batch + verify.
### Fixed
- **UploadPackV2: omit `acknowledgments` section when the client sends
`done`.** Real `git fetch` (v2.53) was failing against production
with `fatal: expected 'packfile', received 'acknowledgments'`. Per
`fetch-pack.c`: when `send_fetch_request` writes a `done\n` pkt-line
it returns `done_sent=1`, and the client state machine transitions
directly from `FETCH_SEND_REQUEST` to `FETCH_GET_PACK` — bypassing
`FETCH_PROCESS_ACKS`. `FETCH_GET_PACK` expects `[shallow-info]
[wanted-refs] [packfile-uris] packfile` and rejects any leading
acknowledgments section. The server now skips the acks section
entirely whenever `done` is in the request. Regression coverage: a
state-machine-level test in `UploadPackV2NegotiationTest` and a
real-git-client multi-round test that forces `done` alongside a
non-empty haves batch by driving the negotiator past `MAX_IN_VAIN`.
### Added
- **Full protocol-v2 capability coverage for UploadPackV2.** Capability
advertisement now includes `ls-refs=unborn`, `fetch=shallow
wait-for-done filter`, `server-option`, and `object-format=sha1`.
Real `git` clients can now perform: `git clone` (symrefs, peel,
unborn-HEAD, annotated tags); `git clone --depth=N`,
`git fetch --deepen=N`, `git clone --shallow-since=<date>`,
`git clone --shallow-exclude=<ref>`; `git clone --filter=blob:none`,
`--filter=blob:limit=…`, `--filter=tree:N`, `--filter=object:type=…`,
`--filter=sparse:oid=…`, `--filter=combine:a+b`;
`git fetch --negotiate-only` (via `wait-for-done`); multi-round
negotiation on stateful transports.
- **Atomic push in ReceivePack.** `@capabilities` now includes
`atomic`. When the client sets the capability, ReceivePack validates
every ref-update in Phase 1 and applies them in Phase 2; if any
apply fails mid-batch we roll back every ref we already touched
from a pre-flight snapshot. Storage-layer failures during rollback
are logged and emitted via telemetry
(`[:ex_git_objectstore, :protocol, :receive_pack, :rollback_failed]`).
- **Partial-clone promisor support.** Fetches whose wants resolve to
blobs or trees (lazy-fetch from a promisor remote) bypass the
session filter so the exact requested object is returned. This is
what makes `git clone --filter=blob:none` (no `--no-checkout`)
work end-to-end.
- **Telemetry** spans + events for every new path:
- `[:ex_git_objectstore, :protocol, :fetch]` — span (start/stop)
with mode (`:full` / `:shallow` / `:filtered` /
`:shallow_filtered` / `:wait_for_done`), wants, haves, object
count, pack bytes.
- `[:ex_git_objectstore, :protocol, :receive_pack, :atomic]` —
span with outcome (`:committed` / `:rolled_back`), commands,
validation_failures.
- `[:ex_git_objectstore, :pack, :filter]` — event with
`objects_in`, `objects_out`, and spec kind.
- `ExGitObjectstore.Pack.Filter` — public parser + applicator for
every filter spec listed in
`Documentation/rev-list-options.txt`.
### Fixed
- `UploadPackV2.feed/2` now treats only flush (`0000`) as a command
terminator, not delim (`0001`). Previously a TCP chunking split
that fell between the `command=` line's delim and the request's
final flush caused premature processing and dropped the rest of
the request. Closes ex_git_objectstore#29.
- `build_acknowledgments/3` always emits an `acknowledgments` section
when the client sent any have, preventing the two regressions
(`no 'ready'` + `expected 'acknowledgments'`) that landed on
production during the earlier fix iteration.
- Shallow walker no longer uses `rest ++ parent_items` for its
queue; O(n²) dropped to O(n). A 500-commit shallow clone now
completes in under 2 s (was > 10 s).
- `sanitize_error/1` in ReceivePack preserves up to 80 chars of
error detail. Git clients now see `ng <ref> hook_rejected:
CODEOWNERS approval required for main` instead of the bare tag.
- Unparseable `filter` specs in fetch are now rejected with a
band-3 protocol error (`ERR filter spec not supported: …`)
instead of being silently ignored. Prevents partial-clone clients
from recording `remote.origin.promisor=true` against a full pack.
- **Merge / rebase toolkit** — complete set of primitives for performing
merges and rebases in-process, without a working directory or any
shell-out to `git`. Unblocks fangorn/anvil#45. See fangorn/ex_git_objectstore#24.
- `write_tree/2` — write a tree from a list of entries.
- `commit_tree/3` — build and store a commit pointing at a tree. Accepts
either structured `%{name, email, when: DateTime}` identities or
pre-formatted git wire-format strings (useful for cherry-pick to
preserve author verbatim). Validates tree + parent SHAs exist and are
the right types. Typed error tuples (`{:error, {:missing_option, key}}`,
`{:missing_tree, sha}`, `{:missing_parent, sha}`, etc.) instead of raises.
Supports optional `:gpgsig`.
- `merge_branches/4` — resolves two refs, runs three-way merge against
their merge base, writes a two-parent merge commit. Returns
`{:error, {:conflicts, [...]}}` on conflict without writing.
- `squash_merge/4` — same three-way merge, single-parent commit —
history from `head` collapsed onto `base`.
- `cherry_pick/3` — three-way replay of one commit onto a new parent.
Preserves author verbatim (no parse/format round-trip), updates
committer, drops the GPG signature (rewrite invalidates it). Rejects
root commits and merge commits (latter pending `:mainline` support).
- `rebase_commits/4` — sequential cherry-pick of a list of commits onto
a new base. Halts on first conflict.
- `merge_base/3` — lowest common ancestor of two commits (top-level
delegate to `Walk.merge_base/3`).
- `ancestor?/3` — true if A is an ancestor of B (reflexive).
- `update_branch/4` — ergonomic wrapper over `Ref.put/3`, with optional
compare-and-swap via `expected_old_sha`.
- `format_identity/1` — identity map → git wire-format string; raw
strings pass through.
- `parse_identity/1` — git wire-format string → identity map, preserving
timezone offset on the returned `DateTime`.
None of these primitives update any refs beyond `update_branch/4`;
persisting merge / rebase results to branches is the caller's
responsibility.
- `blob_sizes/3` — batched variant of `blob_size/2` with bounded-concurrency
parallel reads, deduplication, and `{:ok, %{sha => size}}` return. Drops
the 100s-of-sequential-round-trips cost of rendering large directory
listings on S3-backed storage. See fangorn/ex_git_objectstore#22.
- Repository integrity verification (`Fsck.check/2`) with full and quick modes
- `list_objects/2` callback on Storage behaviour for enumerating loose objects
- Dialyzer enforced in CI pipeline
- `Pack.Reader.read_object/4` with external resolver for REF_DELTA bases
- Full REF_DELTA / thin-pack resolution via `:external_resolver` on `Pack.Reader.parse/2`,
wired through ReceivePack so `git push --thin` and clones from thin-pack-producing
clients work end-to-end (integration-tested against real `git pack-objects --thin`)
- `update_hook` and `post_receive_hook` for ReceivePack (per-ref gating, post-push notifications)
- Configurable per-repo `max_object_size` via `Repo.new/2` (default 128MB unchanged)
- CalVer release automation via CI (`ci/release.sh`, `.anvil.yml` release step)
- Telemetry events for object read/write, ref updates, and receive-pack protocol
- S3 backend parallelism: `list_refs/3` now issues per-ref GETs concurrently
(`Task.async_stream` with `max_concurrency: 32`), and `put_pack/5` uploads
the `.pack` and `.idx` files concurrently. Drops protocol-advertisement
latency on ref-heavy repos from ~25 s → ~1 s. Tunable via
`list_refs_concurrency`, `list_refs_timeout`, and `put_pack_timeout` in the
S3 config map. See fangorn/ex_git_objectstore#25.
- Storage backend telemetry: `[:ex_git_objectstore, :storage, :list_refs]`
and `[:ex_git_objectstore, :storage, :put_pack]` spans (S3 backend only for now)
### Fixed
- `Diff.diff_blobs/4` spec now includes binary file return variant
- `Protocol.ReceivePack` state type includes all struct fields
- Resolved all Dialyzer warnings (4 type errors)
## [0.1.0] - 2026-02-10
### Added
- Git object encoding/decoding (blob, tree, commit, tag) with SHA-1 hashing
- Pluggable storage backends: Filesystem, S3, and Memory
- Ref management: branches, tags, HEAD, compare-and-swap updates
- Packfile support: read/write `.pack` and `.idx` v2 files
- Delta resolution (OFS_DELTA) in pack reader
- Three-way recursive tree merge with conflict detection
- Myers diff algorithm with unified diff output and context hunks
- Commit log traversal and merge base (LCA) finding
- Git wire protocol: pkt-line framing, upload-pack, receive-pack
- ETS-based LRU object cache with configurable size limits
- Atomic file writes and lock-file CAS for filesystem storage
- Path traversal prevention in filesystem storage
- Input validation for refs, SHAs, tree entry names, pack counts
- Apache 2.0 license with headers on all source files
- `mix license_check` task for CI enforcement