feat(echo-cas): content-addressed blob store (Phase 1) by flyingrobots · Pull Request #263 · flyingrobots/echo

flyingrobots · 2026-02-07T22:27:25Z

Summary

Adds echo-cas leaf crate with BlobStore trait and MemoryTier implementation
Content-addressed storage keyed by BLAKE3 hash — no dependency on warp-core
Advisory byte budget tracking for browser contexts (enforcement deferred to Phase 3 GC)
Set-based pin/unpin for retention roots (pre-pin on missing blobs is legal)
14 unit tests covering round-trips, idempotence, hash verification, pinning invariants, and 8MB payload smoke test
WASM-compatible (wasm32-unknown-unknown target checks clean)

What's deferred

DiskTier / ColdTier (Phase 3)
Wire protocol: WANT / PROVIDE / FRAME (Phase 3)
GC sweep / eviction (Phase 3)
serde derives, async methods, iteration API

Test plan

cargo check -p echo-cas
cargo test -p echo-cas — 14/14 pass
cargo clippy -p echo-cas -- -D warnings — zero warnings
cargo check -p echo-cas --target wasm32-unknown-unknown
cargo doc -p echo-cas --no-deps
Pre-push hook passes (fmt, clippy, tests, rustdoc, patterns, determinism)

…oryTier) Introduces echo-cas, a leaf crate providing the BlobStore trait and an in-memory implementation (MemoryTier) for content-addressed storage keyed by BLAKE3 hash. Sufficient for the in-browser website demo; disk/cold tiers and GC are deferred to Phase 3.

coderabbitai · 2026-02-07T22:27:58Z

Warning

Rate limit exceeded

@flyingrobots has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 23 minutes and 44 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Walkthrough

A new echo-cas crate is added to the workspace, providing a content‑addressed blob store: a 32‑byte BLAKE3 BlobHash type, blob_hash() helper, CasError enum, a sync object‑safe BlobStore trait (put / put_verified / get / has / pin / unpin), and an in‑memory MemoryTier implementation with optional byte budgeting and tests.

Changes

Cohort / File(s)	Summary
Workspace Configuration `Cargo.toml`	Added `crates/echo-cas` to `[workspace]` members and added `echo-cas = { version = "0.1.0", path = "crates/echo-cas" }` to `[workspace.dependencies]`.
Crate Manifest & Docs `crates/echo-cas/Cargo.toml`, `crates/echo-cas/README.md`	New crate manifest with metadata and dependencies (`blake3`, `thiserror`); README with license/description for echo-cas.
Core API `crates/echo-cas/src/lib.rs`	Introduces `BlobHash` (32‑byte wrapper + Display + as_bytes), `blob_hash()` helper, `CasError::HashMismatch`, and the `BlobStore` trait. Re-exports `MemoryTier`.
In‑Memory Tier `crates/echo-cas/src/memory.rs`	Adds `MemoryTier` implementing `BlobStore` using `HashMap<BlobHash, Arc<[u8]>>` and `HashSet` for pins; optional `max_bytes` advisory budgeting; observables (len, byte_count, is_over_budget, pinned_count, is_pinned); comprehensive unit tests.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~23 minutes

Poem

🧩 A tiny crate with hashes bright,
Stores each blob by blake3 light,
Pin and proof, verify with care,
MemoryTier holds bytes in air —
Neat, tested, tidy; review beware.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat(echo-cas): content-addressed blob store (Phase 1)' directly describes the main change: adding a new echo-cas crate with a content-addressed blob store implementation, matching the PR's primary objective.
Description check	✅ Passed	The description provides a comprehensive summary of the changeset, covering the BlobStore trait, MemoryTier implementation, testing approach, deferred work, and verification performed—all directly related to the PR's scope.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch echo-cas-phase1

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 12

🤖 Fix all issues with AI agents

In `@crates/echo-cas/Cargo.toml`:
- Line 19: Remove the dangling empty [dev-dependencies] header in Cargo.toml or
make it intentional: either delete the entire [dev-dependencies] section if
there are no dev dependencies, or populate it (e.g., add proptest/criterion) or
replace the header with a commented TODO (e.g., "# TODO: add dev-dependencies")
so the presence of the section is explicit; locate the orphaned
"[dev-dependencies]" token in the Cargo.toml and apply one of these options.
- Line 17: Update the thiserror dependency in Cargo.toml from the 1.x line to
the 2.x line: replace the dependency spec `thiserror = "1"` with `thiserror =
"2"` (or a caret/semver range like `^2`) and then run cargo update / cargo build
to verify proc-macro derives still compile for wasm32; if there is a documented
downstream blocker, add a comment next to the thiserror entry explaining why v2
cannot be used.

In `@crates/echo-cas/README.md`:
- Line 1: The README's SPDX header ("Apache-2.0 OR MIND-UCAL-1.0") conflicts
with the workspace Cargo.toml which uses license.workspace = true and the root
workspace license = "Apache-2.0"; fix by making the license declarations
consistent: either update the workspace/root Cargo.toml to include the
dual-license string "Apache-2.0 OR MIND-UCAL-1.0" (so all crates inherit the OR
license) or remove "MIND-UCAL-1.0" from the README SPDX header; check and update
the crate's Cargo.toml/license or the root workspace license field accordingly
to ensure Cargo.toml, license.workspace, and README.md match.

In `@crates/echo-cas/src/lib.rs`:
- Around line 60-62: Change BlobHash to hide its inner field (make the [u8; 32]
private) so callers cannot construct it directly; keep #[repr(transparent)] and
the derives, expose a single safe public constructor blob_hash(bytes: [u8;32])
(already present) for normal creation, add a from_bytes([u8;32]) constructor
intended for deserialization/wire use with a doc comment "caller asserts this
came from BLAKE3", and keep an as_bytes(&self) -> &[u8;32] accessor for
read-only access; if you intentionally allowed public construction, add a clear
doc warning on the BlobHash struct that constructing from non-BLAKE3 bytes is
legal but semantically meaningless.
- Around line 130-132: The current trait method signature fn get(&self, hash:
&BlobHash) -> Option<Arc<[u8]>> bakes an Arc-backed in-memory representation
into the trait contract; add a FIXME comment above the get signature (reference
symbols: get, BlobHash, Arc<[u8]>) stating FIXME(phase3): avoid forcing
Arc<[u8]> — consider returning bytes::Bytes or using an associated type (e.g.,
type Blob: AsRef<[u8]>) or another zero-copy representation in Phase 3, so
future DiskTier/ColdTier implementations can choose mmap/Bytes/Cow/streams
instead of allocating an Arc.
- Around line 71-78: The Display impl for BlobHash (fn fmt on BlobHash) does 32
separate write! calls via the Formatter; instead, build the full hex string
first and do a single write to the Formatter. In fmt, allocate a String with
capacity self.0.len()*2, use std::fmt::Write::write_fmt or write! into that
String in the loop to append each byte as "{:02x}", then call
f.write_str(&hex_string) once; this keeps the existing Display behavior but
reduces Formatter invocations while remaining safe (no unsafe code).

In `@crates/echo-cas/src/memory.rs`:
- Around line 223-234: Add a unit test that verifies pin idempotence: create a
MemoryTier via MemoryTier::new(), put some data with put(...) to get a hash,
call pin(&hash) twice, assert pinned_count() == 1, then unpin(&hash) and assert
is_pinned(&hash) is false; this ensures pin uses set semantics rather than a
reference count (refer to MemoryTier::new, put, pin, pinned_count, unpin,
is_pinned).
- Around line 154-168: Add a new unit test that exercises the happy path and
idempotence of MemoryTier::put_verified: create a MemoryTier via
MemoryTier::new, compute the correct hash for some test data using blob_hash,
call put_verified(hash, data) and assert it returns Ok, then assert store.len()
== 1, store.byte_count() == data.len(), and that store.get(&hash) returns the
stored bytes; finally call put_verified again with the same hash/data and assert
it still returns Ok and that len() and byte_count() have not increased
(idempotence). Ensure the test is named (e.g., put_verified_happy_path) and
mirrors the style of existing tests.
- Around line 27-32: Add a doc comment on MemoryTier explaining that byte_count
is a monotonically increasing tally of stored bytes for Phase 1 (no GC) and
therefore is_over_budget() will remain true once exceeded until explicit blob
removals are implemented in Phase 3; when implementing remove/evict in the
future you must decrement byte_count by the removed blob's length (compute via
the Arc<[u8]>::len() at removal time) so the budget can recover, and update the
is_over_budget() doc to state this invariant and recovery condition.
- Around line 98-106: The put method currently calls blob_hash then probes the
blobs map twice via contains_key and insert; change both put and put_verified to
use the HashMap entry API (e.g., blobs.entry(hash)) to perform a single lookup:
on Vacant(entry) insert Arc::from(bytes) and increment byte_count only when you
actually insert, and on Occupied return the existing hash without modifying
byte_count; apply the same entry-based logic in put_verified to avoid the double
lookup and ensure byte_count stays correct.
- Around line 108-118: The put_verified function currently does a double lookup
by calling self.blobs.contains_key(&computed) then insert, which can be replaced
with the HashMap entry API to do a single lookup and update byte_count only when
a new blob is actually inserted. Change put_verified to use
self.blobs.entry(computed) and on Vacant insert Arc::from(bytes) while
incrementing self.byte_count only for the new insertion (avoid incrementing when
the entry is Occupied); reference function name put_verified and map self.blobs
to locate the code to update.
- Around line 86-88: The is_over_budget method currently uses strict >
(self.max_bytes.is_some_and(|max| self.byte_count > max)), so a stored
byte_count equal to max_bytes is considered within budget; update the API docs
to explicitly state the budget is inclusive (i.e., exactly max_bytes is NOT over
budget) and mention the related symbols is_over_budget, with_limits, max_bytes,
and byte_count so callers understand the off-by-one behavior; if you prefer
exclusive semantics instead, change the comparison to >= in is_over_budget and
update tests/docs accordingly.

crates/echo-cas/Cargo.toml

coderabbitai · 2026-02-07T22:33:25Z

crates/echo-cas/README.md

@@ -0,0 +1,6 @@
+<!-- SPDX-License-Identifier: Apache-2.0 OR MIND-UCAL-1.0 -->


⚠️ Potential issue | 🟠 Major

License mismatch: README declares Apache-2.0 OR MIND-UCAL-1.0 but Cargo.toml inherits workspace license Apache-2.0 only.

This is a compliance inconsistency. Cargo.toml → license.workspace = true → root workspace declares license = "Apache-2.0". This README introduces a second license (MIND-UCAL-1.0) that doesn't appear anywhere in the manifest chain. Either:

Update the workspace/crate license field to match the dual-license, or

Drop MIND-UCAL-1.0 from the README header.

Shipping contradictory license declarations is a legal landmine.

🤖 Prompt for AI Agents

In `@crates/echo-cas/README.md` at line 1, The README's SPDX header ("Apache-2.0 OR MIND-UCAL-1.0") conflicts with the workspace Cargo.toml which uses license.workspace = true and the root workspace license = "Apache-2.0"; fix by making the license declarations consistent: either update the workspace/root Cargo.toml to include the dual-license string "Apache-2.0 OR MIND-UCAL-1.0" (so all crates inherit the OR license) or remove "MIND-UCAL-1.0" from the README SPDX header; check and update the crate's Cargo.toml/license or the root workspace license field accordingly to ensure Cargo.toml, license.workspace, and README.md match.

crates/echo-cas/src/lib.rs

crates/echo-cas/src/memory.rs

coderabbitai · 2026-02-07T22:33:25Z

crates/echo-cas/src/memory.rs

+    fn put(&mut self, bytes: &[u8]) -> BlobHash {
+        let hash = blob_hash(bytes);
+        if self.blobs.contains_key(&hash) {
+            return hash;
+        }
+        self.byte_count += bytes.len();
+        self.blobs.insert(hash, Arc::from(bytes));
+        hash
+    }


🧹 Nitpick | 🔵 Trivial

put computes hash then probes the map twice on the cold path.

contains_key followed by insert is two lookups. Use the entry API:

Proposed fix

fn put(&mut self, bytes: &[u8]) -> BlobHash { let hash = blob_hash(bytes); - if self.blobs.contains_key(&hash) { - return hash; - } - self.byte_count += bytes.len(); - self.blobs.insert(hash, Arc::from(bytes)); + if let std::collections::hash_map::Entry::Vacant(e) = self.blobs.entry(hash) { + self.byte_count += bytes.len(); + e.insert(Arc::from(bytes)); + } hash }

Same applies to put_verified (Lines 113-115).

🤖 Prompt for AI Agents

In `@crates/echo-cas/src/memory.rs` around lines 98 - 106, The put method currently calls blob_hash then probes the blobs map twice via contains_key and insert; change both put and put_verified to use the HashMap entry API (e.g., blobs.entry(hash)) to perform a single lookup: on Vacant(entry) insert Arc::from(bytes) and increment byte_count only when you actually insert, and on Occupied return the existing hash without modifying byte_count; apply the same entry-based logic in put_verified to avoid the double lookup and ensure byte_count stays correct.

crates/echo-cas/src/memory.rs

- Upgrade thiserror 1 → 2 (leaf crate, no compatibility concern) - Add put_verified happy-path test (store, retrieve, idempotence) - Add pin idempotence test (set semantics: double-pin keeps count at 1) - Document is_over_budget budget semantics (exclusive) and byte_count monotonic invariant (no removal until Phase 3 GC) - Remove empty [dev-dependencies] section - Update bytes 1.11.0 → 1.11.1 (RUSTSEC-2026-0007, workspace-wide)

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@crates/echo-cas/src/memory.rs`:
- Around line 286-296: Add a test that asserts the critical boundary where
stored bytes equal the configured limit remains within budget: create a
MemoryTier via MemoryTier::with_limits(n), call put to store exactly n bytes,
assert !store.is_over_budget(), then put one more byte and assert
store.is_over_budget(); reference the existing with_limits, put, and
is_over_budget methods so the test will catch off-by-one regressions (e.g., `>`
vs `>=`) and ensure the budget semantics remain exclusive.
- Around line 110-120: In put_verified, avoid computing blob_hash(bytes) when
the expected blob is already present: first check
self.blobs.contains_key(&expected) and return Ok(()) if found; only then compute
let computed = blob_hash(bytes) and compare to expected, returning
CasError::HashMismatch on mismatch; if absent, insert into self.blobs and update
self.byte_count — this also removes the redundant second contains_key probe
before insert. Use the existing symbols put_verified, expected, blob_hash,
self.blobs, and self.byte_count to locate and change the logic.

crates/echo-cas/src/memory.rs

- Make BlobHash inner field private; add from_bytes() for deserialization - Optimize Display impl: fold into pre-allocated String, single write_str - Add FIXME(phase3) on get() return type (Arc<[u8]> coupling) - Use entry API in put() to eliminate double map lookup - Optimize put_verified() to skip hashing when blob already stored - Add budget boundary test for exactly-at-limit off-by-one protection

coderabbitai bot requested changes Feb 7, 2026

View reviewed changes

coderabbitai bot requested changes Feb 8, 2026

View reviewed changes

crates/echo-cas/src/memory.rs Show resolved Hide resolved

crates/echo-cas/src/memory.rs Show resolved Hide resolved

CI Bot added 2 commits February 12, 2026 12:37

Merge remote-tracking branch 'origin/main' into echo-cas-phase1

10b1c39

flyingrobots merged commit 8ff8017 into main Feb 12, 2026
18 of 19 checks passed

flyingrobots deleted the echo-cas-phase1 branch February 12, 2026 21:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(echo-cas): content-addressed blob store (Phase 1)#263

feat(echo-cas): content-addressed blob store (Phase 1)#263
flyingrobots merged 4 commits intomainfrom
echo-cas-phase1

flyingrobots commented Feb 7, 2026

Uh oh!

coderabbitai bot commented Feb 7, 2026 •

edited

Loading

Rate limit exceeded

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Feb 7, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Feb 7, 2026

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		@@ -0,0 +1,6 @@
		<!-- SPDX-License-Identifier: Apache-2.0 OR MIND-UCAL-1.0 -->

Conversation

flyingrobots commented Feb 7, 2026

Summary

What's deferred

Test plan

Uh oh!

coderabbitai bot commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Feb 7, 2026 •

edited

Loading