fangorn/ex_git_objectstore
public
Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased]
Added
- Git LFS support (spec v1). Full Large File Storage implementation
exposed as pure request/response modules, matching the existing
UploadPack/ReceivePackstyle (no HTTP server in-tree).ExGitObjectstore.Lfs.Pointer— parse and emit spec-compliant pointer blobs with strict validation (version-first, alphabetical key order, sha256-only OIDs, canonical decimal size, trailing LF).ExGitObjectstore.Lfs.Store— behaviour parallel toStorage, keyed by SHA256. Streamingput/4verifies the observed SHA256 matches the claimed OID and discards the write on mismatch. Shared conformance test suite atExGitObjectstore.Test.LfsStoreConformance.- Backends:
Lfs.Store.Memory,Lfs.Store.Filesystem,Lfs.Store.S3. S3 uses multipart upload for streaming PUT and exposes optionalpresigned_upload/5/presigned_download/4callbacks for direct-to-S3 client transfers. ExGitObjectstore.Lfs.Batch— Batch API handler (POST /objects/batch) returning spec-compliant upload / download / verify actions. Uses presigned URLs when the backend supports them; falls back to library-served URLs for Filesystem/Memory.ExGitObjectstore.Lfs.Transfer— basic-transfer handlers forGET/PUT /objects/:oidandPOST /objects/:oid/verify, with streaming downloads and SHA256-verified uploads.ExGitObjectstore.Lfs.Locks— Locks API v1: create, list, verify, unlock (withforcefor admin override). Lock metadata stored on the repo’sStoragebackend underlfs/locks/*.json.- Telemetry spans emitted at
[:ex_git_objectstore, :lfs, :batch | :transfer | :lock]. Repogains optional:lfs_storageoption alongside:storage.- End-to-end interop coverage against the real
git lfsbinary via a Bandit-backed test HTTP adapter. 11 scenarios exercise push, smudge (download), idempotent re-push, edge-case payload sizes (0-byte via direct HTTP, 1-byte via the CLI), multi-file batch pushes, mixed-state batches (present + absent), concurrent parallel uploads (10 files withlfs.concurrenttransfers=8), direct-HTTP OID tampering (server must 422 and leave nothing on disk), lock create/list/verify/unlock with conflict and 403-by -non-owner paths, and a full end-to-endgit push→git clone→git lfs pullroundtrip over smart-http. Found and fixed one real bug:Batch.handle/3was double-prefixingrepos/<id>/lfsinto action URLs — the:base_urlis now the LFS root itself and the module emits<base_url>/objects/<oid>and<base_url>/verify. The test adapter also wires the existingUploadPack/ReceivePackmodules to the git smart-http v0 endpoints (GET /info/refs,POST /git-upload-pack,POST /git-receive-pack) so a realgit clonecan complete the full clone-then-lfs-pull flow. - S3 backend interop coverage: 14 conformance tests against real
MinIO plus 2 end-to-end
git lfs push/smudgescenarios that exercise the presigned-URL path — client uploads directly to MinIO via presigned PUTs and downloads via presigned GETs, with the library only mediating batch + verify.
Fixed
- UploadPackV2: omit
acknowledgmentssection when the client sendsdone. Realgit fetch(v2.53) was failing against production withfatal: expected 'packfile', received 'acknowledgments'. Perfetch-pack.c: whensend_fetch_requestwrites adone\npkt-line it returnsdone_sent=1, and the client state machine transitions directly fromFETCH_SEND_REQUESTtoFETCH_GET_PACK— bypassingFETCH_PROCESS_ACKS.FETCH_GET_PACKexpects[shallow-info] [wanted-refs] [packfile-uris] packfileand rejects any leading acknowledgments section. The server now skips the acks section entirely wheneverdoneis in the request. Regression coverage: a state-machine-level test inUploadPackV2NegotiationTestand a real-git-client multi-round test that forcesdonealongside a non-empty haves batch by driving the negotiator pastMAX_IN_VAIN.
Added
- Full protocol-v2 capability coverage for UploadPackV2. Capability
advertisement now includes
ls-refs=unborn,fetch=shallow wait-for-done filter,server-option, andobject-format=sha1. Realgitclients can now perform:git clone(symrefs, peel, unborn-HEAD, annotated tags);git clone --depth=N,git fetch --deepen=N,git clone --shallow-since=<date>,git clone --shallow-exclude=<ref>;git clone --filter=blob:none,--filter=blob:limit=…,--filter=tree:N,--filter=object:type=…,--filter=sparse:oid=…,--filter=combine:a+b;git fetch --negotiate-only(viawait-for-done); multi-round negotiation on stateful transports. - Atomic push in ReceivePack.
@capabilitiesnow includesatomic. When the client sets the capability, ReceivePack validates every ref-update in Phase 1 and applies them in Phase 2; if any apply fails mid-batch we roll back every ref we already touched from a pre-flight snapshot. Storage-layer failures during rollback are logged and emitted via telemetry ([:ex_git_objectstore, :protocol, :receive_pack, :rollback_failed]). - Partial-clone promisor support. Fetches whose wants resolve to
blobs or trees (lazy-fetch from a promisor remote) bypass the
session filter so the exact requested object is returned. This is
what makes
git clone --filter=blob:none(no--no-checkout) work end-to-end. - Telemetry spans + events for every new path:
[:ex_git_objectstore, :protocol, :fetch]— span (start/stop) with mode (:full/:shallow/:filtered/:shallow_filtered/:wait_for_done), wants, haves, object count, pack bytes.[:ex_git_objectstore, :protocol, :receive_pack, :atomic]— span with outcome (:committed/:rolled_back), commands, validation_failures.[:ex_git_objectstore, :pack, :filter]— event withobjects_in,objects_out, and spec kind.
ExGitObjectstore.Pack.Filter— public parser + applicator for every filter spec listed inDocumentation/rev-list-options.txt.
Fixed
-
UploadPackV2.feed/2now treats only flush (0000) as a command terminator, not delim (0001). Previously a TCP chunking split that fell between thecommand=line’s delim and the request’s final flush caused premature processing and dropped the rest of the request. Closes ex_git_objectstore#29. -
build_acknowledgments/3always emits anacknowledgmentssection when the client sent any have, preventing the two regressions (no 'ready'+expected 'acknowledgments') that landed on production during the earlier fix iteration. -
Shallow walker no longer uses
rest ++ parent_itemsfor its queue; O(n²) dropped to O(n). A 500-commit shallow clone now completes in under 2 s (was > 10 s). -
sanitize_error/1in ReceivePack preserves up to 80 chars of error detail. Git clients now seeng <ref> hook_rejected: CODEOWNERS approval required for maininstead of the bare tag. -
Unparseable
filterspecs in fetch are now rejected with a band-3 protocol error (ERR filter spec not supported: …) instead of being silently ignored. Prevents partial-clone clients from recordingremote.origin.promisor=trueagainst a full pack. -
Merge / rebase toolkit — complete set of primitives for performing merges and rebases in-process, without a working directory or any shell-out to
git. Unblocks fangorn/anvil#45. See fangorn/ex_git_objectstore#24.write_tree/2— write a tree from a list of entries.commit_tree/3— build and store a commit pointing at a tree. Accepts either structured%{name, email, when: DateTime}identities or pre-formatted git wire-format strings (useful for cherry-pick to preserve author verbatim). Validates tree + parent SHAs exist and are the right types. Typed error tuples ({:error, {:missing_option, key}},{:missing_tree, sha},{:missing_parent, sha}, etc.) instead of raises. Supports optional:gpgsig.merge_branches/4— resolves two refs, runs three-way merge against their merge base, writes a two-parent merge commit. Returns{:error, {:conflicts, [...]}}on conflict without writing.squash_merge/4— same three-way merge, single-parent commit — history fromheadcollapsed ontobase.cherry_pick/3— three-way replay of one commit onto a new parent. Preserves author verbatim (no parse/format round-trip), updates committer, drops the GPG signature (rewrite invalidates it). Rejects root commits and merge commits (latter pending:mainlinesupport).rebase_commits/4— sequential cherry-pick of a list of commits onto a new base. Halts on first conflict.merge_base/3— lowest common ancestor of two commits (top-level delegate toWalk.merge_base/3).ancestor?/3— true if A is an ancestor of B (reflexive).update_branch/4— ergonomic wrapper overRef.put/3, with optional compare-and-swap viaexpected_old_sha.format_identity/1— identity map → git wire-format string; raw strings pass through.parse_identity/1— git wire-format string → identity map, preserving timezone offset on the returnedDateTime.
None of these primitives update any refs beyond
update_branch/4; persisting merge / rebase results to branches is the caller’s responsibility. -
blob_sizes/3— batched variant ofblob_size/2with bounded-concurrency parallel reads, deduplication, and{:ok, %{sha => size}}return. Drops the 100s-of-sequential-round-trips cost of rendering large directory listings on S3-backed storage. See fangorn/ex_git_objectstore#22. -
Repository integrity verification (
Fsck.check/2) with full and quick modes -
list_objects/2callback on Storage behaviour for enumerating loose objects -
Dialyzer enforced in CI pipeline
-
Pack.Reader.read_object/4with external resolver for REF_DELTA bases -
Full REF_DELTA / thin-pack resolution via
:external_resolveronPack.Reader.parse/2, wired through ReceivePack sogit push --thinand clones from thin-pack-producing clients work end-to-end (integration-tested against realgit pack-objects --thin) -
update_hookandpost_receive_hookfor ReceivePack (per-ref gating, post-push notifications) -
Configurable per-repo
max_object_sizeviaRepo.new/2(default 128MB unchanged) -
CalVer release automation via CI (
ci/release.sh,.anvil.ymlrelease step) -
Telemetry events for object read/write, ref updates, and receive-pack protocol
-
S3 backend parallelism:
list_refs/3now issues per-ref GETs concurrently (Task.async_streamwithmax_concurrency: 32), andput_pack/5uploads the.packand.idxfiles concurrently. Drops protocol-advertisement latency on ref-heavy repos from ~25 s → ~1 s. Tunable vialist_refs_concurrency,list_refs_timeout, andput_pack_timeoutin the S3 config map. See fangorn/ex_git_objectstore#25. -
Storage backend telemetry:
[:ex_git_objectstore, :storage, :list_refs]and[:ex_git_objectstore, :storage, :put_pack]spans (S3 backend only for now)
Fixed
Diff.diff_blobs/4spec now includes binary file return variantProtocol.ReceivePackstate type includes all struct fields- Resolved all Dialyzer warnings (4 type errors)
[0.1.0] - 2026-02-10
Added
- Git object encoding/decoding (blob, tree, commit, tag) with SHA-1 hashing
- Pluggable storage backends: Filesystem, S3, and Memory
- Ref management: branches, tags, HEAD, compare-and-swap updates
- Packfile support: read/write
.packand.idxv2 files - Delta resolution (OFS_DELTA) in pack reader
- Three-way recursive tree merge with conflict detection
- Myers diff algorithm with unified diff output and context hunks
- Commit log traversal and merge base (LCA) finding
- Git wire protocol: pkt-line framing, upload-pack, receive-pack
- ETS-based LRU object cache with configurable size limits
- Atomic file writes and lock-file CAS for filesystem storage
- Path traversal prevention in filesystem storage
- Input validation for refs, SHAs, tree entry names, pack counts
- Apache 2.0 license with headers on all source files
mix license_checktask for CI enforcement