Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
cdfd328
feat: TTD Hardening Sprint S1 — Gates and Evidence Integrity
Feb 14, 2026
f3dd84a
docs: update CHANGELOG for TTD Hardening Sprint S1
Feb 15, 2026
e4dec7a
fix(ci): fix classify-changes job dependencies
Feb 15, 2026
f5d00a4
fix(ci): use yq to convert policy to JSON to avoid js-yaml dependency
Feb 15, 2026
2da9849
fix(ci): add CI and scripts to det-policy.yaml
Feb 15, 2026
c7e7cbb
docs: add missing docstrings to scripts and benchmarks
Feb 15, 2026
1904654
docs: add missing docs to benchmark and allow for criterion macro
Feb 15, 2026
2d45853
fix: address CodeRabbit feedback on TTD hardening sprint
Feb 15, 2026
5de4666
chore: finalize PR with changelog and roadmap updates
flyingrobots Feb 15, 2026
62c9c3d
fix(ci): classify all repo files and improve evidence robustness
flyingrobots Feb 15, 2026
84828d7
fix(ci): use isolated directories for build reproducibility check
flyingrobots Feb 15, 2026
3faf5bf
fix(ci): ensure wasm target is added in isolated builds
flyingrobots Feb 15, 2026
92cd6f9
fix(ci): use hyphenated wasm filename in reproducibility check
flyingrobots Feb 15, 2026
5915fc0
fix(ci): improve build reproducibility check and evidence generation
flyingrobots Feb 15, 2026
4b67a8f
fix(ci): address PR #283 review feedback — security, scope, and corre…
flyingrobots Feb 15, 2026
86f93ba
docs: add det-gates backlog items to TASKS-DAG (#284, #285, #286, #287)
flyingrobots Feb 15, 2026
0428729
fix(ci): address round-2 review — run_none logic, iter_batched, claims
flyingrobots Feb 15, 2026
1d49446
fix(ci): round-3 review — regex escaping, zero-test guard, claims
flyingrobots Feb 15, 2026
a125ae0
fix(ci): anchor zero-test guard to prevent substring false positives
flyingrobots Feb 15, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
277 changes: 277 additions & 0 deletions .github/workflows/det-gates.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,277 @@
# SPDX-License-Identifier: Apache-2.0
# © James Ross Ω FLYING•ROBOTS <https://github.com/flyingrobots>
name: det-gates

on:
pull_request:
push:
branches: [main]

Comment on lines +1 to +9
Copy link
Contributor

@coderabbitai coderabbitai bot Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

No concurrency group — parallel runs from rapid pushes can produce interleaved artifacts and confusing evidence chains.

Rapid pushes to the same branch or multiple PRs from the same head can trigger overlapping workflow runs. Without a concurrency key with cancel-in-progress, stale runs waste CI minutes and their artifacts can intermingle with current runs in artifact storage.

♻️ Proposed addition
 on:
   pull_request:
   push:
     branches: [main]
 
+concurrency:
+  group: det-gates-${{ github.ref }}
+  cancel-in-progress: true
+
 permissions:
   contents: read
   actions: read
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# SPDX-License-Identifier: Apache-2.0
# © James Ross Ω FLYING•ROBOTS <https://github.com/flyingrobots>
name: det-gates
on:
pull_request:
push:
branches: [main]
# SPDX-License-Identifier: Apache-2.0
# © James Ross Ω FLYING•ROBOTS <https://github.com/flyingrobots>
name: det-gates
on:
pull_request:
push:
branches: [main]
concurrency:
group: det-gates-${{ github.ref }}
cancel-in-progress: true
🤖 Prompt for AI Agents
In @.github/workflows/det-gates.yml around lines 1 - 9, The workflow "det-gates"
currently lacks a concurrency policy, so simultaneous runs triggered by
pull_request or push can overlap; add a concurrency block to the workflow YAML
that defines a stable group key (for example using the workflow name and
github.ref or github.head_ref) and set cancel-in-progress: true to ensure new
runs cancel prior in-progress runs and prevent interleaved artifacts and wasted
CI minutes.

⚠️ Potential issue | 🟠 Major

Missing permissions: block — a security hardening sprint that doesn't lock down its own workflow token is ironic.

This workflow has no permissions: key. On public repos the defaults are read-only, but on forks or private repos the GITHUB_TOKEN may carry write on contents, packages, issues, etc. For a PR whose title is literally "Hardening Sprint", explicitly declaring the minimum required permissions is table stakes. You only need contents: read and potentially actions: read for artifact downloads.

🔒 Proposed fix — least-privilege permissions
 name: det-gates
 
 on:
   pull_request:
   push:
     branches: [main]
 
+permissions:
+  contents: read
+  actions: read
+
 jobs:
🤖 Prompt for AI Agents
In @.github/workflows/det-gates.yml around lines 1 - 9, Add an explicit
top-level permissions block to the det-gates workflow to enforce least
privilege: declare permissions: with contents: read and actions: read (only
those scopes needed) instead of relying on defaults; update the workflow (the
top-level fields around name: det-gates and on:) to include this permissions key
so the GITHUB_TOKEN cannot inadvertently have write access.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — Valid improvement for CI efficiency. Adding a concurrency group with cancel-in-progress would prevent artifact interleaving from rapid pushes. Deferring to a follow-up CI hardening pass.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flyingrobots Understood — you're deferring the concurrency block.

However: This PR is titled "TTD Hardening Sprint S1 — Gates and Evidence Integrity." Deferring a CI robustness fix from the hardening sprint feels like shipping the fire extinguisher without the pin. The artifact interleaving risk isn't theoretical — rapid pushes will race, and your evidence validation in validate-evidence will ingest artifacts from overlapping runs. That's a direct threat to the integrity claims this sprint is meant to harden.

Recommendation: Either:

  1. Include the concurrency block now (2 lines, zero risk), or
  2. Let me open a tracking issue documenting the residual artifact-race risk and linking it to this PR's scope.

Which do you prefer?


🧠 Learnings used
Learnt from: flyingrobots
Repo: flyingrobots/echo PR: 283
File: det-policy.yaml:136-151
Timestamp: 2026-02-15T16:30:15.709Z
Learning: In the det-policy.yaml system, the classify_changes.cjs script only reads class-level `required_gates` definitions; crate-level `required_gates` fields are not consumed by the script and should not be added to individual crate entries.

Learnt from: flyingrobots
Repo: flyingrobots/echo PR: 0
File: :0-0
Timestamp: 2026-01-24T23:22:15.388Z
Learning: The user (flyingrobots) prefers extremely detailed, pedantic code reviews with no mercy — "GOD-TIER NIGHTMARE CODE REVIEWER" style. They want critical scrutiny, exhaustive analysis, identification of landmines, design questions, and actionable demands on every review.

jobs:
classify-changes:
name: classify-changes
runs-on: ubuntu-latest
outputs:
run_full: ${{ steps.classify.outputs.run_full }}
run_reduced: ${{ steps.classify.outputs.run_reduced }}
run_none: ${{ steps.classify.outputs.run_none }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Detect changed files
id: changed
env:
BASE_REF: ${{ github.base_ref }}
EVENT_NAME: ${{ github.event_name }}
run: |
if [ "$EVENT_NAME" = "pull_request" ]; then
git fetch origin "$BASE_REF" --depth=1
git diff --name-only "origin/$BASE_REF...HEAD" > changed.txt
else
git diff --name-only HEAD~1..HEAD > changed.txt || true
fi
Comment on lines +28 to +34
Copy link
Contributor

@coderabbitai coderabbitai bot Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Push-to-main path only inspects the last commit — multi-commit merges or force pushes can slip through unclassified.

Line 33: git diff --name-only HEAD~1..HEAD captures changes from the immediate parent only. For a merge commit this includes all merged changes (correct), but for a force push landing multiple new non-merge commits, only the final commit's diff is analyzed. Files changed in earlier commits won't appear in changed.txt, so they bypass classification entirely.

If branch protection rules prevent force pushes to main, document that assumption. Otherwise, consider using the GitHub-provided ${{ github.event.before }} SHA (via env var) for push events to diff the full push range.

🤖 Prompt for AI Agents
In @.github/workflows/det-gates.yml around lines 28 - 34, The push-to-main path
currently uses `git diff --name-only HEAD~1..HEAD` which only inspects the
immediate parent commit and can miss changes from multi-commit force-pushes;
update the workflow to diff the full push range by using the GitHub event SHAs
(e.g., reference `${{ github.event.before }}` and `${{ github.sha }}` via
environment variables) instead of `HEAD~1..HEAD`, and/or document the assumption
that force pushes to main are disallowed; ensure the block that writes to
`changed.txt` (the branch handling around `EVENT_NAME`, `BASE_REF`, and the `git
diff` invocation) uses the event SHAs for push events so all changed files are
captured.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — The HEAD~1 fallback for push events only inspects the last commit. For squash merges this is correct; for multi-commit merges it could miss files. Deferring — the primary gate path is PRs, and main pushes serve as a safety net.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flyingrobots Understood — deferring is reasonable if your workflow enforces squash-merge to main. The HEAD~1 diff captures the entire squash commit's changes correctly in that scenario.

Would you like me to open a GitHub issue to track the multi-commit merge edge case for future hardening? That way you have a backlog item if branch-protection rules ever change or if you want defense-in-depth for accidental force-pushes.


✏️ Learnings added
Learnt from: flyingrobots
Repo: flyingrobots/echo PR: 283
File: .github/workflows/det-gates.yml:28-34
Timestamp: 2026-02-15T20:20:25.895Z
Learning: In the det-gates.yml workflow for the flyingrobots/echo repository, the push-to-main path uses `HEAD~1..HEAD` for change detection. The primary gate enforcement happens on PRs; main pushes serve as a safety net only. The workflow assumes squash-merge to main, where `HEAD~1` correctly captures all changes in the squash commit.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: flyingrobots
Repo: flyingrobots/echo PR: 283
File: det-policy.yaml:136-151
Timestamp: 2026-02-15T16:30:15.709Z
Learning: In the det-policy.yaml system, the classify_changes.cjs script only reads class-level `required_gates` definitions; crate-level `required_gates` fields are not consumed by the script and should not be added to individual crate entries.

Learnt from: flyingrobots
Repo: flyingrobots/echo PR: 0
File: :0-0
Timestamp: 2026-01-24T23:22:15.388Z
Learning: The user (flyingrobots) prefers extremely detailed, pedantic code reviews with no mercy — "GOD-TIER NIGHTMARE CODE REVIEWER" style. They want critical scrutiny, exhaustive analysis, identification of landmines, design questions, and actionable demands on every review.

echo "Changed files:"
cat changed.txt || true

- name: Convert policy to JSON
run: |
yq -o=json det-policy.yaml > det-policy.json

- name: Classify path impact from det-policy.yaml
id: classify
run: |
node ./scripts/classify_changes.cjs det-policy.json changed.txt >> $GITHUB_OUTPUT

Comment on lines +10 to +46
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

No timeout-minutes on any of the 8 jobs — a hung cargo build or benchmark burns 6 hours of CI before GitHub kills it.

GitHub Actions default job timeout is 360 minutes. You have 6 Rust compilation jobs, a Criterion benchmark, and a WASM double-build. Any of these can hang on a linker, a flaky network fetch of crate dependencies, or an infinite-loop regression in the code under test. Without explicit timeouts, a single hung job burns 6 hours of billable CI minutes before auto-cancel, and blocks the entire evidence pipeline since validate-evidence uses always() and waits for everything.

Set timeout-minutes on every job. Typical values: 15–20 for test/build jobs, 30 for benchmarks, 5 for classify-changes and validate-evidence.

⏱️ Proposed fix — add timeout-minutes to each job
   classify-changes:
     name: classify-changes
     runs-on: ubuntu-latest
+    timeout-minutes: 5
     outputs:
   determinism-linux:
     name: G1 determinism (linux)
     needs: classify-changes
     if: needs.classify-changes.outputs.run_full == 'true'
     runs-on: ubuntu-latest
+    timeout-minutes: 20
     steps:
   determinism-macos:
     name: G1 determinism (macos)
     needs: classify-changes
     if: needs.classify-changes.outputs.run_full == 'true'
     runs-on: macos-latest
+    timeout-minutes: 20
     steps:
   static-inspection:
     name: DET-001 Static Inspection
     needs: classify-changes
     if: needs.classify-changes.outputs.run_full == 'true'
     runs-on: ubuntu-latest
+    timeout-minutes: 10
     steps:
   decoder-security:
     name: G2 decoder security tests
     needs: classify-changes
     if: needs.classify-changes.outputs.run_full == 'true' || needs.classify-changes.outputs.run_reduced == 'true'
     runs-on: ubuntu-latest
+    timeout-minutes: 15
     steps:
   perf-regression:
     name: G3 perf regression (criterion)
     needs: classify-changes
     if: needs.classify-changes.outputs.run_full == 'true'
     runs-on: ubuntu-latest
+    timeout-minutes: 30
     steps:
   build-repro:
     name: G4 build reproducibility (wasm)
     needs: classify-changes
     if: needs.classify-changes.outputs.run_full == 'true' || needs.classify-changes.outputs.run_reduced == 'true'
     runs-on: ubuntu-latest
+    timeout-minutes: 30
     steps:
   validate-evidence:
     name: Evidence schema / claim policy
+    timeout-minutes: 5
     needs:
🤖 Prompt for AI Agents
In @.github/workflows/det-gates.yml around lines 10 - 46, The workflow jobs lack
explicit timeouts; add a timeout-minutes field to every job definition (e.g. the
classify-changes job) to avoid 6-hour GitHub default hangs — insert
timeout-minutes: 5 under the classify-changes job, set timeout-minutes: 15–20
for Rust build/test jobs and WASM builds, timeout-minutes: 30 for
Criterion/benchmark jobs, and timeout-minutes: 5 for validate-evidence (and any
short utility jobs); ensure the timeout-minutes key is placed at the top-level
of each job block (same indentation as name and runs-on) so GitHub Actions will
enforce the limits.

determinism-linux:
name: G1 determinism (linux)
needs: classify-changes
if: needs.classify-changes.outputs.run_full == 'true'
runs-on: ubuntu-latest
Comment on lines +47 to +51
Copy link
Contributor

@coderabbitai coderabbitai bot Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

No Rust/cargo caching across any of the six jobs — every PR pays full compilation cost from scratch.

Six jobs (determinism-linux, determinism-macos, static-inspection, decoder-security, perf-regression, build-repro) each start with a cold cargo cache. For Rust projects, compilation dominates CI time. Adding Swatinem/rust-cache@v2 (or actions/cache targeting ~/.cargo and target/) to the test/inspection jobs would cut CI time dramatically.

Exception: build-repro should NOT use target caching since it's verifying reproducibility from clean builds. Cargo registry caching (download cache only) is acceptable there.

⚡ Example — add rust-cache to test jobs
      - uses: actions/checkout@v4
      - name: Setup Rust
        uses: dtolnay/rust-toolchain@stable
      - uses: Swatinem/rust-cache@v2  # Add this after toolchain setup
      - name: Run parity tests (linux)
        run: ...

Also applies to: 80-84, 113-117, 147-151, 167-171, 185-189

🤖 Prompt for AI Agents
In @.github/workflows/det-gates.yml around lines 47 - 51, Add Rust/Cargo caching
to the CI jobs that currently start from cold builds: insert
Swatinem/rust-cache@v2 (or an actions/cache setup caching ~/.cargo and target/)
immediately after the Rust toolchain setup step in the jobs determinism-linux,
determinism-macos, static-inspection, decoder-security, and perf-regression to
avoid full recompiles on every PR; for build-repro do NOT cache the target
directory (only allow registry/download caching if desired) so the
reproducibility check still runs from a clean build.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — Adding Swatinem/rust-cache@v2 would significantly reduce CI build times. Deferring to a CI performance optimization pass.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rate Limit Exceeded

@flyingrobots have exceeded the limit for the number of chat messages per hour. Please wait 2 minutes and 3 seconds before sending another message.

steps:
- uses: actions/checkout@v4
- name: Setup Rust
uses: dtolnay/rust-toolchain@stable
- name: Run parity tests (linux)
run: |
cargo test -p echo-scene-port test_float_parity_with_js -- --nocapture 2>&1 | tee det-linux.log
grep -q " 0 passed" det-linux.log && echo "FATAL: zero tests matched filter" && exit 1 || true
- name: Run DIND suite (linux)
run: |
node scripts/dind-run-suite.mjs --mode run | tee dind-linux.log
- name: Create digest table
env:
COMMIT_SHA: ${{ github.sha }}
RUN_ID: ${{ github.run_id }}
run: |
mkdir -p artifacts
echo "target,commit,run_id,digest" > artifacts/digest-table.csv
echo "linux,${COMMIT_SHA},${RUN_ID},$(sha256sum dind-report.json | cut -d' ' -f1)" >> artifacts/digest-table.csv
- name: Upload artifacts
if: always()
uses: actions/upload-artifact@v4
with:
name: det-linux-artifacts
path: |
det-linux.log
dind-linux.log
dind-report.json
artifacts/digest-table.csv

determinism-macos:
name: G1 determinism (macos)
needs: classify-changes
if: needs.classify-changes.outputs.run_full == 'true'
runs-on: macos-latest
steps:
- uses: actions/checkout@v4
- name: Setup Rust
uses: dtolnay/rust-toolchain@stable
- name: Run parity tests (macos)
run: |
cargo test -p echo-scene-port test_float_parity_with_js -- --nocapture 2>&1 | tee det-macos.log
grep -q " 0 passed" det-macos.log && echo "FATAL: zero tests matched filter" && exit 1 || true
- name: Run DIND suite (macos)
run: |
node scripts/dind-run-suite.mjs --mode run | tee dind-macos.log
- name: Create digest table
env:
COMMIT_SHA: ${{ github.sha }}
RUN_ID: ${{ github.run_id }}
run: |
mkdir -p artifacts
echo "target,commit,run_id,digest" > artifacts/digest-table.csv
echo "macos,${COMMIT_SHA},${RUN_ID},$(shasum -a 256 dind-report.json | cut -d' ' -f1)" >> artifacts/digest-table.csv
- name: Upload artifacts
if: always()
uses: actions/upload-artifact@v4
with:
name: det-macos-artifacts
path: |
det-macos.log
dind-macos.log
dind-report.json
artifacts/digest-table.csv

static-inspection:
name: DET-001 Static Inspection
needs: classify-changes
if: needs.classify-changes.outputs.run_full == 'true'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install ripgrep
run: sudo apt-get update && sudo apt-get install -y ripgrep
- name: Run determinism check
id: det_check
env:
DETERMINISM_PATHS: "crates/warp-core crates/warp-geom crates/warp-wasm crates/warp-ffi crates/echo-wasm-abi crates/echo-scene-port crates/echo-scene-codec crates/echo-graph crates/echo-ttd crates/echo-dind-harness crates/echo-dind-tests crates/ttd-browser crates/ttd-protocol-rs crates/ttd-manifest"
run: |
./scripts/ban-nondeterminism.sh | tee static-inspection.log
- name: Create report
if: always()
env:
DET_OUTCOME: ${{ steps.det_check.outcome }}
run: |
if [ "$DET_OUTCOME" = "success" ]; then
echo '{"claim_id": "DET-001", "status": "PASSED"}' > static-inspection.json
else
echo '{"claim_id": "DET-001", "status": "FAILED"}' > static-inspection.json
fi
- name: Upload inspection artifacts
if: always()
uses: actions/upload-artifact@v4
with:
name: static-inspection
path: |
static-inspection.log
static-inspection.json

decoder-security:
name: G2 decoder security tests
needs: classify-changes
if: needs.classify-changes.outputs.run_full == 'true' || needs.classify-changes.outputs.run_reduced == 'true'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Rust
uses: dtolnay/rust-toolchain@stable
- name: Run codec tests
run: |
cargo test -p echo-scene-codec --lib cbor::tests -- --nocapture 2>&1 | tee sec-tests.log
grep -q " 0 passed" sec-tests.log && echo "FATAL: zero tests matched filter" && exit 1 || true
- name: Upload security artifacts
if: always()
uses: actions/upload-artifact@v4
with:
name: sec-artifacts
path: |
sec-tests.log
docs/determinism/sec-claim-map.json

perf-regression:
name: G3 perf regression (criterion)
needs: classify-changes
if: needs.classify-changes.outputs.run_full == 'true'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Rust
uses: dtolnay/rust-toolchain@stable
- name: Run benchmarks
run: |
cargo bench -p warp-benches --bench materialization_hotpath -- --output-format bencher | tee perf.log
- name: Upload perf artifacts
if: always()
uses: actions/upload-artifact@v4
with:
name: perf-artifacts
path: perf.log
Comment on lines 173 to 190
Copy link
Contributor

@coderabbitai coderabbitai bot Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

G3 "perf regression" doesn't detect regressions — there is no baseline comparison.

The job runs Criterion benchmarks, captures output, and uploads it. That's a "benchmarks compile and run" gate. The name says "perf regression" but there is no historical baseline, no threshold, no comparison, and no failure criterion tied to performance. A 10x regression in materialization_hotpath passes this gate with a green check.

At minimum, either:

  1. Rename to "G3 perf baseline (criterion)" to set honest expectations, or
  2. Integrate critcmp or bencher.dev with a stored baseline and fail on regressions beyond a threshold.

If option 1 for now, document that regression detection is a future enhancement.

🤖 Prompt for AI Agents
In @.github/workflows/det-gates.yml around lines 167 - 183, The
"perf-regression" workflow job currently only runs Criterion and uploads
perf.log without any baseline comparison; either rename the job to reflect it is
a baseline run (change job name from perf-regression to something like "G3 perf
baseline (criterion)" and update the job display name) and add a step that
echoes a comment noting that regression detection is a future enhancement, or
implement a comparison step that runs critcmp/bencher.dev against a stored
baseline and fails the job when the benchmark "materialization_hotpath"
regresses beyond the configured threshold; locate the job block labeled
perf-regression and modify the name field or append a post-processing step after
the cargo bench command to perform the comparison and set a non-zero exit on
regression.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — G3 currently gates on benchmark compilation and execution, not regression detection. True regression detection requires baseline storage (e.g., criterion's baseline comparison or GitHub Action benchmark caching). This is a known limitation and a future enhancement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rate Limit Exceeded

@flyingrobots have exceeded the limit for the number of chat messages per hour. Please wait 1 minutes and 51 seconds before sending another message.


build-repro:
name: G4 build reproducibility (wasm)
needs: classify-changes
if: needs.classify-changes.outputs.run_full == 'true' || needs.classify-changes.outputs.run_reduced == 'true'
runs-on: ubuntu-latest
steps:
- name: Setup Rust (Global)
uses: dtolnay/rust-toolchain@stable
with:
targets: wasm32-unknown-unknown
- name: Checkout Build 1
uses: actions/checkout@v4
with:
path: build1
- name: Build 1
run: |
cd build1
rustup target add wasm32-unknown-unknown
cargo build --release --target wasm32-unknown-unknown -p ttd-browser
sha256sum target/wasm32-unknown-unknown/release/ttd_browser.wasm > ../hash1.txt
cp target/wasm32-unknown-unknown/release/ttd_browser.wasm ../build1.wasm
- name: Checkout Build 2
uses: actions/checkout@v4
with:
path: build2
- name: Build 2
run: |
cd build2
rustup target add wasm32-unknown-unknown
cargo build --release --target wasm32-unknown-unknown -p ttd-browser
sha256sum target/wasm32-unknown-unknown/release/ttd_browser.wasm > ../hash2.txt
cp target/wasm32-unknown-unknown/release/ttd_browser.wasm ../build2.wasm
Comment on lines +206 to +223
Copy link
Contributor

@coderabbitai coderabbitai bot Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Redundant rustup target add wasm32-unknown-unknown in both build steps — already declared in the toolchain setup.

Lines 202 and 213 both run rustup target add wasm32-unknown-unknown, but the dtolnay/rust-toolchain@stable step on lines 191–194 already specifies targets: wasm32-unknown-unknown. The target is already installed by the time these steps execute. Not a bug, but unnecessary noise in a hardening-grade workflow.

🧹 Proposed cleanup
       - name: Build 1
         run: |
           cd build1
-          rustup target add wasm32-unknown-unknown
           cargo build --release --target wasm32-unknown-unknown -p ttd-browser
           sha256sum target/wasm32-unknown-unknown/release/ttd_browser.wasm > ../hash1.txt
           cp target/wasm32-unknown-unknown/release/ttd_browser.wasm ../build1.wasm
       - name: Checkout Build 2
         uses: actions/checkout@v4
         with:
           path: build2
       - name: Build 2
         run: |
           cd build2
-          rustup target add wasm32-unknown-unknown
           cargo build --release --target wasm32-unknown-unknown -p ttd-browser
🤖 Prompt for AI Agents
In @.github/workflows/det-gates.yml around lines 199 - 216, Remove the redundant
rustup target installation lines from the "Build 1" and "Build 2" steps: delete
the `rustup target add wasm32-unknown-unknown` commands in the run blocks for
the steps named "Build 1" and "Build 2" because the toolchain step
(dtolnay/rust-toolchain@stable with targets: wasm32-unknown-unknown) already
installs that target; keep the remaining build commands (cargo build, sha256sum,
cp) unchanged.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — The rustup target add in each build step is redundant with the toolchain setup. Kept for now as a defensive measure since the steps run in separate checkout directories. Can clean up in a follow-up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rate Limit Exceeded

@flyingrobots have exceeded the limit for the number of chat messages per hour. Please wait 1 minutes and 58 seconds before sending another message.

- name: Compare hashes
run: |
diff hash1.txt hash2.txt || (echo "Reproducibility failure: Hashes differ!" && exit 1)
echo "Hashes match: $(cat hash1.txt)"
- name: Upload build artifacts
if: always()
uses: actions/upload-artifact@v4
with:
name: build-repro-artifacts
path: |
hash1.txt
hash2.txt
build1.wasm
build2.wasm

validate-evidence:
name: Evidence schema / claim policy
needs:
- classify-changes
- determinism-linux
- determinism-macos
- static-inspection
- decoder-security
- perf-regression
- build-repro
if: always() && needs.classify-changes.outputs.run_none != 'true'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Download all artifacts
uses: actions/download-artifact@v4
with:
path: gathered-artifacts
- name: Verify artifact presence
env:
RUN_FULL: ${{ needs.classify-changes.outputs.run_full }}
run: |
ls -R gathered-artifacts
# Always required (run on both full and reduced)
[ -d gathered-artifacts/sec-artifacts ] || (echo "Missing sec-artifacts" && exit 1)
[ -d gathered-artifacts/build-repro-artifacts ] || (echo "Missing build-repro-artifacts" && exit 1)
# Only required when run_full (these jobs are skipped for run_reduced)
if [ "$RUN_FULL" = "true" ]; then
[ -d gathered-artifacts/det-linux-artifacts ] || (echo "Missing det-linux-artifacts" && exit 1)
[ -d gathered-artifacts/det-macos-artifacts ] || (echo "Missing det-macos-artifacts" && exit 1)
[ -d gathered-artifacts/perf-artifacts ] || (echo "Missing perf-artifacts" && exit 1)
[ -d gathered-artifacts/static-inspection ] || (echo "Missing static-inspection" && exit 1)
fi
Comment on lines +239 to +271
Copy link
Contributor

@coderabbitai coderabbitai bot Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cd /tmp && find . -name "validate_claims.cjs" 2>/dev/null | head -5

Repository: flyingrobots/echo

Length of output: 43


🏁 Script executed:

git ls-files | grep -E "(validate_claims|det-gates)" | head -20

Repository: flyingrobots/echo

Length of output: 120


🏁 Script executed:

cat scripts/validate_claims.cjs

Repository: flyingrobots/echo

Length of output: 2442


Content validation for FAILED evidence is missing — validate_claims.cjs only checks VERIFIED claims.

The validate-evidence job (lines 247–261) performs directory presence checks, not content validation. The static-inspection artifact passes this gate so long as the directory exists, regardless of whether its report contains "status": "FAILED".

The script at scripts/validate_claims.cjs (line 267) does not provide the missing safeguard. Its logic is:

for (const claim of data.claims) {
  if (claim.status === 'VERIFIED') {
    // validate required pointers...
  }
}

It exclusively validates claims marked VERIFIED and completely ignores claims with FAILED status. A FAILED static-inspection report survives the entire validation pipeline unchallenged.

Add an explicit content check in the validate-evidence job to reject evidence packs containing any failed claims, or confirm this gap is handled downstream (and document it visibly in the workflow).

🤖 Prompt for AI Agents
In @.github/workflows/det-gates.yml around lines 229 - 261, The workflow's
validate-evidence job and the validate_claims.cjs logic currently only check for
directory presence and validate claims with status === 'VERIFIED', ignoring any
claims with status === 'FAILED'; update the pipeline to explicitly fail when any
FAILED claims exist by either (A) modifying validate_claims.cjs (the loop over
data.claims and the claim.status check) to detect claim.status === 'FAILED' and
throw/exit non‑zero with a clear message, or (B) add a step in the
validate-evidence job that scans the gathered-artifacts (e.g., static-inspection
JSON reports) for any "status":"FAILED" and exit 1 if found; reference the
validate-evidence job and the validate_claims.cjs data.claims/claim.status logic
to locate where to implement this check and ensure the workflow fails visibly
when failed claims are present.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By design — validate_claims.cjs verifies that VERIFIED claims have valid immutable pointers (workflow, run_id, commit_sha, artifact_name). FAILED/UNVERIFIED claims don't claim to have evidence, so pointer validation is not applicable. The evidence pack correctly records the status.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rate Limit Exceeded

@flyingrobots have exceeded the limit for the number of chat messages per hour. Please wait 1 minutes and 57 seconds before sending another message.

- name: Generate evidence pack
run: |
./scripts/generate_evidence.cjs gathered-artifacts
- name: Validate evidence pointers
run: |
./scripts/validate_claims.cjs gathered-artifacts/evidence.json
Comment on lines +272 to +277
Copy link
Contributor

@coderabbitai coderabbitai bot Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, get the full workflow file to check invocation patterns
cat -n .github/workflows/det-gates.yml | head -280 | tail -50

Repository: flyingrobots/echo

Length of output: 2511


🏁 Script executed:

#!/bin/bash
# Also check around line 45 to verify the classify step
cat -n .github/workflows/det-gates.yml | sed -n '40,50p'

Repository: flyingrobots/echo

Length of output: 518


🏁 Script executed:

#!/bin/bash
# Check execute permissions on all CJS scripts in scripts/
fd -e cjs . scripts/ --exec stat -c '%A %n' {} \;

Repository: flyingrobots/echo

Length of output: 204


🏁 Script executed:

#!/bin/bash
# Check git filemode tracking for CJS scripts
fd -e cjs . scripts/ --exec git ls-files -s {} \;

Repository: flyingrobots/echo

Length of output: 379


Inconsistent script invocation patterns — lines 264 & 267 must use node prefix like line 45.

Lines 264 and 267 invoke .cjs scripts directly (./scripts/generate_evidence.cjs, ./scripts/validate_claims.cjs), while line 45 uses node ./scripts/classify_changes.cjs. Although execute bits are currently set in git metadata (100755), direct invocation creates a fragility hazard: git migrations, archive extractions, or permission resets during CI environment provisioning can strip execute bits without warning. For a hardening sprint, inconsistency is unacceptable. Enforce node across all .cjs script invocations.

Proposed fix
       - name: Generate evidence pack
         run: |
-          ./scripts/generate_evidence.cjs gathered-artifacts
+          node ./scripts/generate_evidence.cjs gathered-artifacts
       - name: Validate evidence pointers
         run: |
-          ./scripts/validate_claims.cjs gathered-artifacts/evidence.json
+          node ./scripts/validate_claims.cjs gathered-artifacts/evidence.json
🤖 Prompt for AI Agents
In @.github/workflows/det-gates.yml around lines 262 - 267, The workflow uses
direct execution of .cjs scripts which is fragile; update the two script steps
that call ./scripts/generate_evidence.cjs and ./scripts/validate_claims.cjs to
invoke them with Node (e.g., node ./scripts/generate_evidence.cjs
gathered-artifacts and node ./scripts/validate_claims.cjs
gathered-artifacts/evidence.json) to match the existing pattern used by
classify_changes.cjs and ensure execution does not rely on filesystem execute
bits.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — will review the invocation patterns in a follow-up pass.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rate Limit Exceeded

@flyingrobots have exceeded the limit for the number of chat messages per hour. Please wait 1 minutes and 53 seconds before sending another message.

57 changes: 56 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,61 @@

## Unreleased

## [0.1.3]

### Fixed (Sprint S1)

- **CI Security:** Hardened `det-gates` workflow against script injection by using
environment variables for all `github.*` interpolations (branch refs, SHA,
run ID, event name).
- **WASM Reproducibility:** Implemented bit-exact reproducibility checks (G4)
for `ttd-browser` WASM using hash comparison of clean isolated rebuilds.
- **Static Inspection:** Added automated CI guard for `DET-001` covering all 14
DET_CRITICAL crate paths (expanded from `echo-wasm-abi` only). Report now
conditional on check outcome (PASSED/FAILED).
- **Evidence Validation:** Made artifact presence checks in `validate-evidence`
conditional on classification tier; added `det-macos-artifacts` check;
`run_reduced` and `DET_NONCRITICAL` paths no longer hard-fail.
- **Policy Classification:** Promoted `warp-benches` from DET_NONCRITICAL to
DET_IMPORTANT so benchmark crate changes trigger reduced gates.
- **Benchmark Correctness:** Replaced `let _ =` with `.unwrap()` on all
`bus.emit()` calls; migrated `iter_with_setup` to `iter_batched`.
- **CBOR Robustness:** Expanded negative security tests for `ProjectionKind`
and `LabelAnchor` enum tags and optimized `MAX_OPS` boundary check.
- **Evidence Integrity:** Enhanced `generate_evidence.cjs` and `validate_claims.cjs`
with stricter semantic validation (SHAs, run IDs) and artifact existence checks.
- **Script Quality:** Replaced `process.exit(1)` with `throw` in
`classify_changes.cjs`; removed dead import; exported functions for testing.
- **Governance:** Moved `sec-claim-map.json` to `docs/determinism/`, formalized
gate states in `RELEASE_POLICY.md`, tightened claim statements in
`CLAIM_MAP.yaml`.

## [0.1.2] — 2026-02-14

### Added — TTD Hardening Sprint S1 (Gates & Evidence)

- **Path-Aware CI Gates:** Implemented `det-policy.yaml` and `classify_changes.cjs`
to classify workspace crates (DET_CRITICAL/IMPORTANT/NONCRITICAL) and drive
selective CI gate triggering (G1-G4).
- **Hardening Gates (G1-G4):**
- **G1 (Determinism):** Integrated float parity tests and the DIND (Deterministic
Ironclad Nightmare Drills) suite on both Linux and macOS.
- **G2 (Security):** Added negative security tests for the CBOR decoder
(MAX_OPS, invalid versions/enums, truncated payloads).
- **G3 (Performance):** Created `materialization_hotpath` Criterion benchmark
in `warp-benches` to track materialization overhead.
- **G4 (Build):** Added WASM build reproducibility checks verifying bit-exact
artifacts across clean rebuilds.
- **Evidence Integrity:** Added `generate_evidence.cjs` and `validate_claims.cjs`
to ensure all `VERIFIED` claims are backed by immutable CI artifacts (run IDs,
commit SHAs).
- **Static Inspection:** Integrated `DET-001` automated static inspection into CI
to verify zero-HashMap usage in deterministic guest paths.
- **Governance:** Published `RELEASE_POLICY.md` (staging/prod blockers) and
`ROLLBACK_TTD.md` (commit-ordered rollback sequences).
- **Security Claim Mapping:** Exported `sec-claim-map.json` mapping decoder
controls to explicit negative test cases.

### Added — Deterministic Scene Data (TTD)

- **Scene Rendering Port (`echo-scene-port`):** Defined the core data model for
Expand Down Expand Up @@ -101,7 +156,7 @@
- Added 1 s cooldown after the read loop exits to prevent tight reconnect loops
when the hub accepts connections but immediately closes them.

### Fixed
### Fixed (Legacy)

- **Security:** upgraded `bytes` 1.11.0 → 1.11.1 to fix RUSTSEC-2026-0007
(integer overflow in `BytesMut::reserve`).
Loading