@@ -1,0 +1,328 @@
# Red Team Journal — ExGitObjectstore
**Issue**: [#5](https://github.com/notifd/ex_git_objectstore/issues/5)
**Date**: 2026-02-10
**Status**: Complete
**Team**: security-hunter, perf-hunter, correctness-auditor, interop-tester, reliability-auditor
## Executive Summary
Red team assessment of ExGitObjectstore v0.1.0 found **25 unique significant findings**: 10 Critical, 10 High, 5 Medium. The library has fundamental architectural problems in pack handling that make it unusable for real-world repositories, silent data corruption in GPG signature round-tripping, and multiple denial-of-service vectors via resource exhaustion.
**Top 3 user-breaking problems:**
1. **ReceivePack stores delta objects with invalid type headers** — any `git push` with deltas (virtually all pushes) writes corrupted, unreadable objects
2. **GPG signatures corrupt on every round-trip** — continuation line double-spacing silently changes SHAs, breaking signed commits/tags
3. **Pack reader re-downloads entire packfile for every object lookup** — makes the library O(N × pack_size) for N lookups, unusable for repos with packs
---
## Findings
### Critical
#### C1. ReceivePack stores delta objects with invalid type headers [Interop, Correctness]
**File:** `lib/ex_git_objectstore/protocol/receive_pack.ex:337-358`
**Found by:** interop-tester
`store_pack_objects/2` calls `Object.encode_raw_from_type(entry.type, entry.data)` for every entry from `Reader.parse()`. But `Reader.parse()` returns unresolved delta entries with types `:ofs_delta` and `:ref_delta`. `encode_raw_from_type` converts these via `Atom.to_string(type)`, producing invalid git object headers like `"ofs_delta 22\0<delta_data>"`. Git only recognizes `blob`, `tree`, `commit`, `tag`.
**Impact:** Any `git push` containing delta-compressed objects (virtually all real pushes — git almost always delta-compresses) stores corrupted objects that are unreadable by both git and ExGitObjectstore. This is the primary user journey for receiving pushes and it is broken.
---
#### C2. GPG signature double-space corruption on round-trip [Correctness, Interop]
**Files:** `lib/ex_git_objectstore/object/commit.ex:107-136`, `lib/ex_git_objectstore/object/tag.ex:114-127`
**Found by:** correctness-auditor, interop-tester
`fold_continuation_lines/1` preserves the leading space from continuation lines when parsing. Then `encode_multiline_header/2` adds another leading space when encoding. Every parse→encode cycle adds an extra space to each continuation line.
**Evidence (interop-tester):** SHA changed from `2967e21b...` → `4d15c2bc...` → `2672b36b...` across three round-trips of the same commit.
**Impact:** Any code path that reads then re-writes a signed commit/tag silently corrupts the GPG signature and changes the SHA. Signed commits become unverifiable. This affects merge, rebase, cherry-pick, or any operation that re-encodes commits.
---
#### C3. Pack reader has NO decompression size limit — zlib bomb [Security, Performance]
**File:** `lib/ex_git_objectstore/pack/reader.ex:339-355`
**Found by:** security-hunter, perf-hunter, reliability-auditor
`decompress_data/1` in the pack reader calls `:zlib.inflate(z, data)` with zero size limits. Unlike `Object.safe_decompress/2` (which at least checks post-hoc), pack decompression has absolutely no size guard.
**Impact:** A malicious packfile sent via `git push` can contain a zlib bomb that decompresses to gigabytes, causing OOM and crashing the server. This is the primary DoS vector.
---
#### C4. Object.safe_decompress size check is post-hoc — zlib bomb still works [Security]
**File:** `lib/ex_git_objectstore/object.ex:217-236`
**Found by:** security-hunter
`safe_decompress/2` calls `:zlib.inflate(z, data)` which fully decompresses into memory **before** checking `byte_size(decompressed) > max_size`. A zlib bomb (small compressed → multi-GB decompressed) causes OOM before the check runs.
**Impact:** The 128MB decompression limit is illusory. A crafted loose object can still OOM the server.
---
#### C5. ObjectResolver re-downloads entire packfile for EVERY object lookup [Performance]
**File:** `lib/ex_git_objectstore/object_resolver.ex:52-57`
**Found by:** perf-hunter
`find_in_packs/3` calls `Repo.storage_call(repo, :get_pack, [pack_sha])` for each pack, downloading the **entire packfile** and **entire index** into memory. Neither is cached between lookups.
**Impact:** Looking up 100 objects in a repo with a 500MB pack downloads 500MB × 100 = 50GB of I/O. With S3 backend, this is catastrophically expensive. Makes the library fundamentally unusable for any repository that uses packfiles (i.e., all real repositories).
---
#### C6. Binary search for compressed length causes O(log N) redundant decompressions per object [Performance, Security]
**File:** `lib/ex_git_objectstore/pack/reader.ex:357-391`
**Found by:** security-hunter, perf-hunter, reliability-auditor
After decompressing each pack entry, `find_compressed_length` performs a binary search that re-decompresses the data O(log N) times with different prefix lengths. Combined with no size limits (C3), a single large object triggers multiple full decompressions.
**Impact:** Pack parsing is orders of magnitude slower than necessary. For a 100MB pack, each object incurs ~17 full decompressions. Amplifies the zlib bomb DoS by log2(compressed_size). The correct approach is to use `:zlib.inflateInit` and track bytes consumed from the Z_STREAM.
---
#### C7. Pack reader loads entire packfile into memory as single binary [Performance]
**File:** `lib/ex_git_objectstore/pack/reader.ex` (entire module)
**Found by:** perf-hunter
`parse/1` takes the entire packfile as a single binary argument. `read_object/2` also takes the full pack binary. There is no streaming interface.
**Impact:** A 1GB packfile requires 1GB+ of memory just to hold the binary, plus decompression buffers. Real-world git repos commonly have multi-GB packfiles. Library is unusable for repos over ~500MB.
---
#### C8. `build_sha_index` scans/decompresses every object for each REF_DELTA [Performance]
**File:** `lib/ex_git_objectstore/pack/reader.ex:401-419`
**Found by:** perf-hunter
When resolving a REF_DELTA, `find_sha_offset/3` calls `build_sha_index/1` which decompresses EVERY non-delta object to compute their SHAs. This index is not cached — it's rebuilt for every single REF_DELTA.
**Impact:** A pack with 1000 REF_DELTAs and 10000 total objects: decompresses all 10000 objects 1000 times = 10M decompressions.
---
#### C9. `list_refs` crashes with MatchError on concurrent ref deletion [Reliability]
**File:** `lib/ex_git_objectstore/storage/filesystem.ex:217`
**Found by:** reliability-auditor
Bare pattern match `{:ok, sha} = File.read(file)` inside `Enum.map`. If any ref file is deleted between `list_files_recursive` and `File.read` (race condition), or `File.read` returns an error, the entire call crashes with `MatchError`.
**Impact:** Application crash. Any concurrent ref deletion during a `list_refs` call triggers this. Common in production with concurrent pushes.
---
#### C10. Lock file leak on process crash — permanent ref deadlock [Reliability]
**File:** `lib/ex_git_objectstore/storage/filesystem.ex:146-193`
**Found by:** reliability-auditor
If the process is killed between creating the lock file and cleanup, the `.lock` file persists forever. All subsequent writes get `{:error, :ref_locked}` permanently. No stale lock detection (real git checks if the owning process is dead).
**Impact:** Permanent ref deadlock requiring manual intervention. A supervisor timeout during a ref update permanently locks that ref.
---
### High
#### H1. `put_pack` is not atomic — crash leaves pack without index [Reliability]
**File:** `lib/ex_git_objectstore/storage/filesystem.ex:110-118`
**Found by:** reliability-auditor
Writes `.pack` then `.idx` as separate `atomic_write` calls. Crash between them leaves an orphaned `.pack` without `.idx`. `list_packs` returns it, but `get_pack_index` returns `:not_found`, breaking `ObjectResolver.find_in_packs`.
**Impact:** Objects in the orphaned pack become unreachable. Data availability loss.
---
#### H2. S3 compare-and-swap is fundamentally racy [Reliability]
**File:** `lib/ex_git_objectstore/storage/s3.ex:133-155`
**Found by:** reliability-auditor
`put_ref` with `old_sha` does read-then-write with no atomicity guarantee. Between `s3_get` and `s3_put`, another process can write a different value.
**Impact:** Silent data loss — two concurrent pushes to the same branch can cause one to be silently overwritten. The CAS guarantee that protects against force-push races is broken on S3.
---
#### H3. Circular symbolic ref causes unbounded recursion → stack overflow [Reliability, Security]
**File:** `lib/ex_git_objectstore/ref.ex:99-105`
**Found by:** reliability-auditor
If HEAD → refs/heads/main → HEAD (or longer cycle), `resolve_ref` recurses infinitely. No depth limit on ref resolution.
**Impact:** Stack overflow crash. An attacker who can write refs can DoS the server.
---
#### H4. `merge_base` algorithm incorrect for criss-cross histories [Correctness]
**File:** `lib/ex_git_objectstore/walk.ex:83-103`
**Found by:** correctness-auditor
Builds two full ancestor sets then finds first common by BFS from sha_a. BFS explores in parent-list order (not topological order), so it may not find the LOWEST common ancestor. Does not handle multiple merge bases.
**Impact:** Incorrect merge bases lead to wrong merge results — false conflicts or silently wrong trees.
---
#### H5. Walk.log materializes ALL commits before returning [Performance]
**File:** `lib/ex_git_objectstore/walk.ex:45-71`
**Found by:** perf-hunter
`log/3` walks the entire commit graph up to limit, collects all commits, sorts by time, then applies skip/take. With `max_count: :infinity` (default), walks the entire graph.
**Impact:** `log(repo, sha)` on a repo with 1M commits materializes all 1M commit objects in memory.
---
#### H6. Diff mishandles files without trailing newline [Correctness]
**File:** `lib/ex_git_objectstore/diff/myers.ex:42-43`
**Found by:** correctness-auditor
`String.split(text, "\n", trim: false)` treats trailing newline as producing an empty string element. No `\ No newline at end of file` marker is produced. Files differing only in trailing newline show misleading diffs.
**Impact:** Common real-world scenario produces incorrect diff output.
---
#### H7. S3 key injection via unvalidated repo ID [Security]
**Files:** `lib/ex_git_objectstore/repo.ex:38-47`, `lib/ex_git_objectstore/storage/s3.ex:219-225`
**Found by:** security-hunter
`Repo.new/2` accepts any string as `id` without validation. `prefix/1` constructs `"repos/#{id}"`. S3 `safe_key` only checks for `..` but allows `/` and other special characters. `put_head`/`get_head` don't call `safe_key` at all.
**Impact:** If repo IDs are user-supplied, an attacker can read/write arbitrary S3 keys outside the expected prefix.
---
#### H8. Filesystem symlink following in `list_files_recursive` [Security]
**File:** `lib/ex_git_objectstore/storage/filesystem.ex:355-377`
**Found by:** security-hunter
`File.dir?(path)` follows symlinks. If a symlink exists inside the storage directory, recursive listing follows it outside the storage root. `safe_path` protects individual file operations but not directory enumeration.
**Impact:** Information disclosure of file names outside the storage root.
---
#### H9. O(N²) list concatenation in pack writer and upload-pack [Performance]
**Files:** `lib/ex_git_objectstore/pack/writer.ex:78-85`, `lib/ex_git_objectstore/protocol/upload_pack.ex:325-340,353-358,396-419`
**Found by:** perf-hunter
Multiple uses of `acc ++ [entry]` and `acc ++ new_objects` in reduce/map. Each `++` copies the left operand.
**Impact:** Writing a pack with 10k objects: ~50M element copies. Clone of 100k objects: ~5B element copies. Quadratic time for pack generation and clone/fetch.
---
#### H10. Commit/Tag parse crash on missing required fields [Reliability]
**Files:** `lib/ex_git_objectstore/object/commit.ex:160`, `lib/ex_git_objectstore/object/tag.ex:106`
**Found by:** reliability-auditor
`struct!(__MODULE__, fields)` raises `KeyError` if required fields are missing from malformed content. Crashes the reading process instead of returning `{:error, reason}`.
**Impact:** Any corrupted commit or tag object in the store crashes the reader.
---
### Medium
#### M1. ETS cache tables are `:public` — cross-process cache poisoning [Security]
**File:** `lib/ex_git_objectstore/cache/ets.ex:36-51`
**Found by:** security-hunter
Cache tables created with `:public` access. Any process in the BEAM VM can read/write/delete entries.
**Impact:** In multi-tenant environments, cache poisoning, information disclosure, or DoS.
---
#### M2. Merge doesn't distinguish file-vs-directory conflicts [Correctness]
**File:** `lib/ex_git_objectstore/merge.ex:162-175`
**Found by:** correctness-auditor
If "ours" has file `foo` and "theirs" has directory `foo/`, the conflict struct only records SHA, not mode. Caller cannot distinguish "both modified" from "file-vs-directory" type change.
**Impact:** Ambiguous conflict reporting; potential silent wrong behavior for type-change scenarios.
---
#### M3. ETS cache crash if table destroyed mid-operation [Reliability]
**File:** `lib/ex_git_objectstore/cache/ets.ex:71-82`
**Found by:** reliability-auditor
No try/rescue around ETS operations. If `ETS.destroy` is called while another process is in `get`/`put`, `:ets.lookup` raises `ArgumentError`.
**Impact:** Application crash in concurrent environments with cache lifecycle changes.
---
#### M4. Receive-pack doesn't validate ref names from client [Security]
**File:** `lib/ex_git_objectstore/protocol/receive_pack.ex:257-281`
**Found by:** security-hunter
`parse_command_line/1` extracts ref names from client data without validation. Validation eventually happens in `Ref.put/4`, but late — pre-hooks or logging see unvalidated input.
**Impact:** Defense-in-depth issue. Protocol layer should reject bad input at the boundary.
---
#### M5. Tree sort uses string comparison — wrong for non-ASCII filenames [Correctness]
**File:** `lib/ex_git_objectstore/object/tree.ex:143-151`
**Found by:** correctness-auditor
Sort uses Elixir string comparison (UTF-8 aware) instead of git's `base_name_compare()` (byte-by-byte unsigned comparison). For entries with bytes > 127, sort order may differ.
**Impact:** Trees with non-ASCII filenames produce different SHA-1 hashes than git. Objects will be incompatible with real git. (For ASCII-only repos this is not an issue.)
---
## Investigation Areas
- [x] Security: 10 findings (3 Critical, 2 High, 2 Medium from security-hunter)
- [x] Performance: 15 findings (4 Critical, 5 High, 6 Medium from perf-hunter)
- [x] Functionality: 15 findings (3 Critical, 4 High, 8 Medium/Low from correctness-auditor)
- [x] Interoperability: 3 findings (1 High, 1 Medium, 1 Low from interop-tester; plus extensive positive verification)
- [x] Reliability: 28 findings (6 Critical, 7 High, 10 Medium, 5 Low from reliability-auditor)
## What Works Well
The interop-tester verified that the following all work correctly against real git:
- Blob/tree/commit/tag encoding (SHA matches git for ASCII content)
- Loose object format passes `git fsck --strict`
- Pack writing passes `git verify-pack` and `git index-pack`
- Pack index is byte-for-byte identical to `git index-pack` output
- Reading git-generated packs (including delta objects)
- UTF-8 in author/committer/message
- Empty tree SHA matches well-known value
- Octopus merge (multiple parents) encodes correctly
- PktLine format encoding/decoding
## Appendix: Full Finding Cross-Reference
| ID | Severity | Area | Primary Finder | Also Found By |
|----|----------|------|---------------|---------------|
| C1 | Critical | Interop/Correctness | interop-tester | — |
| C2 | Critical | Correctness/Interop | correctness-auditor | interop-tester |
| C3 | Critical | Security/Performance | security-hunter | perf-hunter, reliability-auditor |
| C4 | Critical | Security | security-hunter | — |
| C5 | Critical | Performance | perf-hunter | — |
| C6 | Critical | Performance/Security | security-hunter | perf-hunter, reliability-auditor |
| C7 | Critical | Performance | perf-hunter | — |
| C8 | Critical | Performance | perf-hunter | correctness-auditor |
| C9 | Critical | Reliability | reliability-auditor | — |
| C10 | Critical | Reliability | reliability-auditor | — |
| H1 | High | Reliability | reliability-auditor | — |
| H2 | High | Reliability | reliability-auditor | — |
| H3 | High | Reliability/Security | reliability-auditor | — |
| H4 | High | Correctness | correctness-auditor | — |
| H5 | High | Performance | perf-hunter | — |
| H6 | High | Correctness | correctness-auditor | — |
| H7 | High | Security | security-hunter | — |
| H8 | High | Security | security-hunter | — |
| H9 | High | Performance | perf-hunter | — |
| H10 | High | Reliability | reliability-auditor | — |
| M1 | Medium | Security | security-hunter | — |
| M2 | Medium | Correctness | correctness-auditor | — |
| M3 | Medium | Reliability | reliability-auditor | — |
| M4 | Medium | Security | security-hunter | — |
| M5 | Medium | Correctness | correctness-auditor | — |