ref:main

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

Added

  • Git LFS support (spec v1). Full Large File Storage implementation exposed as pure request/response modules, matching the existing UploadPack/ReceivePack style (no HTTP server in-tree).
    • ExGitObjectstore.Lfs.Pointer — parse and emit spec-compliant pointer blobs with strict validation (version-first, alphabetical key order, sha256-only OIDs, canonical decimal size, trailing LF).
    • ExGitObjectstore.Lfs.Store — behaviour parallel to Storage, keyed by SHA256. Streaming put/4 verifies the observed SHA256 matches the claimed OID and discards the write on mismatch. Shared conformance test suite at ExGitObjectstore.Test.LfsStoreConformance.
    • Backends: Lfs.Store.Memory, Lfs.Store.Filesystem, Lfs.Store.S3. S3 uses multipart upload for streaming PUT and exposes optional presigned_upload/5 / presigned_download/4 callbacks for direct-to-S3 client transfers.
    • ExGitObjectstore.Lfs.Batch — Batch API handler (POST /objects/batch) returning spec-compliant upload / download / verify actions. Uses presigned URLs when the backend supports them; falls back to library-served URLs for Filesystem/Memory.
    • ExGitObjectstore.Lfs.Transfer — basic-transfer handlers for GET/PUT /objects/:oid and POST /objects/:oid/verify, with streaming downloads and SHA256-verified uploads.
    • ExGitObjectstore.Lfs.Locks — Locks API v1: create, list, verify, unlock (with force for admin override). Lock metadata stored on the repo’s Storage backend under lfs/locks/*.json.
    • Telemetry spans emitted at [:ex_git_objectstore, :lfs, :batch | :transfer | :lock].
    • Repo gains optional :lfs_storage option alongside :storage.
    • End-to-end interop coverage against the real git lfs binary via a Bandit-backed test HTTP adapter. 11 scenarios exercise push, smudge (download), idempotent re-push, edge-case payload sizes (0-byte via direct HTTP, 1-byte via the CLI), multi-file batch pushes, mixed-state batches (present + absent), concurrent parallel uploads (10 files with lfs.concurrenttransfers=8), direct-HTTP OID tampering (server must 422 and leave nothing on disk), lock create/list/verify/unlock with conflict and 403-by -non-owner paths, and a full end-to-end git pushgit clonegit lfs pull roundtrip over smart-http. Found and fixed one real bug: Batch.handle/3 was double-prefixing repos/<id>/lfs into action URLs — the :base_url is now the LFS root itself and the module emits <base_url>/objects/<oid> and <base_url>/verify. The test adapter also wires the existing UploadPack/ReceivePack modules to the git smart-http v0 endpoints (GET /info/refs, POST /git-upload-pack, POST /git-receive-pack) so a real git clone can complete the full clone-then-lfs-pull flow.
    • S3 backend interop coverage: 14 conformance tests against real MinIO plus 2 end-to-end git lfs push/smudge scenarios that exercise the presigned-URL path — client uploads directly to MinIO via presigned PUTs and downloads via presigned GETs, with the library only mediating batch + verify.

Fixed

  • UploadPackV2: omit acknowledgments section when the client sends done. Real git fetch (v2.53) was failing against production with fatal: expected 'packfile', received 'acknowledgments'. Per fetch-pack.c: when send_fetch_request writes a done\n pkt-line it returns done_sent=1, and the client state machine transitions directly from FETCH_SEND_REQUEST to FETCH_GET_PACK — bypassing FETCH_PROCESS_ACKS. FETCH_GET_PACK expects [shallow-info] [wanted-refs] [packfile-uris] packfile and rejects any leading acknowledgments section. The server now skips the acks section entirely whenever done is in the request. Regression coverage: a state-machine-level test in UploadPackV2NegotiationTest and a real-git-client multi-round test that forces done alongside a non-empty haves batch by driving the negotiator past MAX_IN_VAIN.

Added

  • Full protocol-v2 capability coverage for UploadPackV2. Capability advertisement now includes ls-refs=unborn, fetch=shallow wait-for-done filter, server-option, and object-format=sha1. Real git clients can now perform: git clone (symrefs, peel, unborn-HEAD, annotated tags); git clone --depth=N, git fetch --deepen=N, git clone --shallow-since=<date>, git clone --shallow-exclude=<ref>; git clone --filter=blob:none, --filter=blob:limit=…, --filter=tree:N, --filter=object:type=…, --filter=sparse:oid=…, --filter=combine:a+b; git fetch --negotiate-only (via wait-for-done); multi-round negotiation on stateful transports.
  • Atomic push in ReceivePack. @capabilities now includes atomic. When the client sets the capability, ReceivePack validates every ref-update in Phase 1 and applies them in Phase 2; if any apply fails mid-batch we roll back every ref we already touched from a pre-flight snapshot. Storage-layer failures during rollback are logged and emitted via telemetry ([:ex_git_objectstore, :protocol, :receive_pack, :rollback_failed]).
  • Partial-clone promisor support. Fetches whose wants resolve to blobs or trees (lazy-fetch from a promisor remote) bypass the session filter so the exact requested object is returned. This is what makes git clone --filter=blob:none (no --no-checkout) work end-to-end.
  • Telemetry spans + events for every new path:
    • [:ex_git_objectstore, :protocol, :fetch] — span (start/stop) with mode (:full / :shallow / :filtered / :shallow_filtered / :wait_for_done), wants, haves, object count, pack bytes.
    • [:ex_git_objectstore, :protocol, :receive_pack, :atomic] — span with outcome (:committed / :rolled_back), commands, validation_failures.
    • [:ex_git_objectstore, :pack, :filter] — event with objects_in, objects_out, and spec kind.
  • ExGitObjectstore.Pack.Filter — public parser + applicator for every filter spec listed in Documentation/rev-list-options.txt.

Fixed

  • UploadPackV2.feed/2 now treats only flush (0000) as a command terminator, not delim (0001). Previously a TCP chunking split that fell between the command= line’s delim and the request’s final flush caused premature processing and dropped the rest of the request. Closes ex_git_objectstore#29.

  • build_acknowledgments/3 always emits an acknowledgments section when the client sent any have, preventing the two regressions (no 'ready' + expected 'acknowledgments') that landed on production during the earlier fix iteration.

  • Shallow walker no longer uses rest ++ parent_items for its queue; O(n²) dropped to O(n). A 500-commit shallow clone now completes in under 2 s (was > 10 s).

  • sanitize_error/1 in ReceivePack preserves up to 80 chars of error detail. Git clients now see ng <ref> hook_rejected: CODEOWNERS approval required for main instead of the bare tag.

  • Unparseable filter specs in fetch are now rejected with a band-3 protocol error (ERR filter spec not supported: …) instead of being silently ignored. Prevents partial-clone clients from recording remote.origin.promisor=true against a full pack.

  • Merge / rebase toolkit — complete set of primitives for performing merges and rebases in-process, without a working directory or any shell-out to git. Unblocks fangorn/anvil#45. See fangorn/ex_git_objectstore#24.

    • write_tree/2 — write a tree from a list of entries.
    • commit_tree/3 — build and store a commit pointing at a tree. Accepts either structured %{name, email, when: DateTime} identities or pre-formatted git wire-format strings (useful for cherry-pick to preserve author verbatim). Validates tree + parent SHAs exist and are the right types. Typed error tuples ({:error, {:missing_option, key}}, {:missing_tree, sha}, {:missing_parent, sha}, etc.) instead of raises. Supports optional :gpgsig.
    • merge_branches/4 — resolves two refs, runs three-way merge against their merge base, writes a two-parent merge commit. Returns {:error, {:conflicts, [...]}} on conflict without writing.
    • squash_merge/4 — same three-way merge, single-parent commit — history from head collapsed onto base.
    • cherry_pick/3 — three-way replay of one commit onto a new parent. Preserves author verbatim (no parse/format round-trip), updates committer, drops the GPG signature (rewrite invalidates it). Rejects root commits and merge commits (latter pending :mainline support).
    • rebase_commits/4 — sequential cherry-pick of a list of commits onto a new base. Halts on first conflict.
    • merge_base/3 — lowest common ancestor of two commits (top-level delegate to Walk.merge_base/3).
    • ancestor?/3 — true if A is an ancestor of B (reflexive).
    • update_branch/4 — ergonomic wrapper over Ref.put/3, with optional compare-and-swap via expected_old_sha.
    • format_identity/1 — identity map → git wire-format string; raw strings pass through.
    • parse_identity/1 — git wire-format string → identity map, preserving timezone offset on the returned DateTime.

    None of these primitives update any refs beyond update_branch/4; persisting merge / rebase results to branches is the caller’s responsibility.

  • blob_sizes/3 — batched variant of blob_size/2 with bounded-concurrency parallel reads, deduplication, and {:ok, %{sha => size}} return. Drops the 100s-of-sequential-round-trips cost of rendering large directory listings on S3-backed storage. See fangorn/ex_git_objectstore#22.

  • Repository integrity verification (Fsck.check/2) with full and quick modes

  • list_objects/2 callback on Storage behaviour for enumerating loose objects

  • Dialyzer enforced in CI pipeline

  • Pack.Reader.read_object/4 with external resolver for REF_DELTA bases

  • Full REF_DELTA / thin-pack resolution via :external_resolver on Pack.Reader.parse/2, wired through ReceivePack so git push --thin and clones from thin-pack-producing clients work end-to-end (integration-tested against real git pack-objects --thin)

  • update_hook and post_receive_hook for ReceivePack (per-ref gating, post-push notifications)

  • Configurable per-repo max_object_size via Repo.new/2 (default 128MB unchanged)

  • CalVer release automation via CI (ci/release.sh, .anvil.yml release step)

  • Telemetry events for object read/write, ref updates, and receive-pack protocol

  • S3 backend parallelism: list_refs/3 now issues per-ref GETs concurrently (Task.async_stream with max_concurrency: 32), and put_pack/5 uploads the .pack and .idx files concurrently. Drops protocol-advertisement latency on ref-heavy repos from ~25 s → ~1 s. Tunable via list_refs_concurrency, list_refs_timeout, and put_pack_timeout in the S3 config map. See fangorn/ex_git_objectstore#25.

  • Storage backend telemetry: [:ex_git_objectstore, :storage, :list_refs] and [:ex_git_objectstore, :storage, :put_pack] spans (S3 backend only for now)

Fixed

  • Diff.diff_blobs/4 spec now includes binary file return variant
  • Protocol.ReceivePack state type includes all struct fields
  • Resolved all Dialyzer warnings (4 type errors)

[0.1.0] - 2026-02-10

Added

  • Git object encoding/decoding (blob, tree, commit, tag) with SHA-1 hashing
  • Pluggable storage backends: Filesystem, S3, and Memory
  • Ref management: branches, tags, HEAD, compare-and-swap updates
  • Packfile support: read/write .pack and .idx v2 files
  • Delta resolution (OFS_DELTA) in pack reader
  • Three-way recursive tree merge with conflict detection
  • Myers diff algorithm with unified diff output and context hunks
  • Commit log traversal and merge base (LCA) finding
  • Git wire protocol: pkt-line framing, upload-pack, receive-pack
  • ETS-based LRU object cache with configurable size limits
  • Atomic file writes and lock-file CAS for filesystem storage
  • Path traversal prevention in filesystem storage
  • Input validation for refs, SHAs, tree entry names, pack counts
  • Apache 2.0 license with headers on all source files
  • mix license_check task for CI enforcement