Skip to content

Add proto-array for incremental fork choice#235

Draft
pablodeymo wants to merge 3 commits intomainfrom
proto-array-fork-choice
Draft

Add proto-array for incremental fork choice#235
pablodeymo wants to merge 3 commits intomainfrom
proto-array-fork-choice

Conversation

@pablodeymo
Copy link
Collaborator

Summary

  • Introduce ProtoArray + VoteTracker in the fork_choice crate for incremental LMD-GHOST head computation
  • Replace the O(validators × chain_depth) per-call recomputation in update_head with delta-based weight propagation: O(changed_votes) + O(nodes) single backward pass + O(depth) head lookup
  • update_safe_target unchanged — runs once/slot with a different attestation pool, not worth optimizing in this phase
  • Debug oracle (#[cfg(debug_assertions)]) runs both implementations in parallel and asserts equality

Design

  • ForkChoiceState struct wraps ProtoArray + VoteTracker, owned by BlockChainServer
  • Initialized from live chain at startup via ForkChoiceState::from_store()
  • on_block_core registers new blocks and prunes on finalization
  • update_head uses proto-array; falls back to spec impl for tree visualization (log_tree=true, 1x/slot)
  • 13 unit tests for proto-array (linear chains, forks, tiebreaks, pruning, vote changes)

Test plan

  • make fmt — clean
  • make lint — clean
  • All 15 fork_choice unit tests pass (13 new + 2 existing)
  • All 27 fork choice spec tests pass (debug oracle validates every update_head call)
  • All 8 signature spec tests pass
  • Devnet validation

  Replace the O(validators × chain_depth) per-call LMD-GHOST recomputation
  in update_head with an incremental proto-array that tracks vote deltas and
  propagates weights in a single backward pass over a flat node array.

  Proto-array serves update_head only (up to 2 calls/slot). update_safe_target
  keeps the existing spec implementation since it runs once per slot and uses
  a different attestation pool (known + new merged).

  A debug oracle assertion (#[cfg(debug_assertions)]) runs both implementations
  in parallel and asserts equality, validated by all 27 fork choice spec tests.
@github-actions
Copy link

🤖 Kimi Code Review

Security & Correctness Issues

1. Critical: Pruning Race Condition (store.rs:668-672)

if let Some(finalized) = finalized {
    fc.proto_array.prune(finalized.root);
    fc.vote_tracker.reset();
}

The vote tracker is reset after pruning, but votes may reference pruned nodes. This creates dangling references. Reset the vote tracker before pruning to prevent validators from voting for non-existent blocks.

2. Index Overflow in VoteTracker (proto_array.rs:174-176)

if id >= self.votes.len() {
    self.votes.resize(id + 1, None);
}

Validator IDs can be arbitrarily large (up to u64::MAX), causing unbounded memory growth. Add a hard cap:

const MAX_VALIDATORS: usize = 1_000_000; // Adjust based on network
if id >= MAX_VALIDATORS {
    return deltas; // Or return error
}

3. Silent Failure on Missing Blocks (proto_array.rs:186-190)

if let Some(idx) = proto_array.get_index(&new_root) {
    deltas[idx] += 1;
}

Votes for unknown blocks are silently dropped. This could lead to consensus divergence. Log warnings or return errors for unknown blocks.

4. Integer Overflow in Weight Calculation (proto_array.rs:73)

self.nodes[i].weight += deltas[i];

i64 weights can overflow with large validator sets. Use saturating arithmetic:

self.nodes[i].weight = self.nodes[i].weight.saturating_add(deltas[i]);

Performance & Design Issues

5. Inefficient Pruning (proto_array.rs:117-150)

The pruning algorithm is O(n²) due to the nested loop checking descendants. Use a more efficient approach:

let mut stack = vec![finalized_idx];
while let Some(idx) = stack.pop() {
    keep[idx] = true;
    // Add children to stack (requires maintaining child indices)
}

6. Redundant Block Registration (store.rs:675)

Blocks are registered in both Store::insert_signed_block and fc.proto_array.on_block. Ensure consistency - the proto-array registration should happen after successful block processing to avoid registering invalid blocks.

7. Debug Oracle Overhead (store.rs:131-145)

The debug assertion runs the full spec implementation on every head update. This is expensive in production. Consider:

  • Making it conditional on a debug flag
  • Sampling the check instead of running every time
  • Moving to a separate validation mode

API & Safety Issues

8. Missing Error Handling in on_block (store.rs:664-672)

The pruning operation can fail if the finalized root doesn't exist in the proto-array, but this isn't handled. Add proper error handling:

if let Some(finalized) = finalized {
    if fc.proto_array.get_index(&finalized.root).is_none() {
        warn!("Finalized root not found in proto-array: {:?}", finalized.root);
    } else {
        fc.vote_tracker.reset();
        fc.proto_array.prune(finalized.root);
    }
}

9. Inconsistent State After Pruning (store.rs:664-672)

The store and proto-array can become inconsistent if pruning fails. Consider making the operations atomic or adding rollback mechanisms.

Minor Issues

10. Test Coverage Gaps

  • Missing tests for edge cases in pruning (empty tree, single node)
  • No tests for validator ID overflow
  • No tests for duplicate block registration with different parameters

11. Documentation

The ForkChoiceState::from_store method should document that it performs an O(n log n) sort operation which could be expensive for large chains.

Recommendations

  1. Immediate fixes needed:

    • Fix pruning race condition (Item 1)
    • Add validator ID bounds checking (Item 2)
    • Handle missing blocks in vote processing (Item 3)
  2. Performance improvements:

    • Optimize pruning algorithm (Item 5)
    • Consider caching child indices for faster traversal
  3. Safety improvements:

    • Add metrics for dropped votes due to missing blocks
    • Add validation that proto-array and store remain consistent

Automated review by Kimi (Moonshot AI) · custom prompt

@github-actions
Copy link

🤖 Codex Code Review

  1. High – vote loss when attestation head is unknown to proto-array
    proto_array.rs:223, proto_array.rs:228
    VoteTracker::compute_deltas updates self.votes[id] = Some(new_root) even when new_root is not in proto_array (no idx, so no +1 applied).
    If that block is inserted later, the next delta pass sees old_root == new_root and skips, so this validator’s weight is never added. This can permanently underweight branches and miscompute LMD GHOST head (consensus risk).
    Suggested fix: only commit self.votes[id] when new_root exists in proto_array, or keep a pending-unknown vote path and apply once the block is known.

  2. Medium – silent acceptance of missing parents can create disconnected nodes
    proto_array.rs:36, proto_array.rs:42
    on_block comment says parent must exist, but implementation silently sets parent: None when parent is absent. That can hide caller/state bugs and create orphan subtrees that never contribute to head selection.
    Suggested fix: return Result (or at least debug_assert!) when parent_root != H256::ZERO and parent is missing.

Briefly: the incremental proto-array integration and debug oracle comparison in store.rs:99 are strong additions; structure is generally clean.

I couldn’t run tests in this environment because cargo/rustup needs writable toolchain temp files and the sandbox denied that.


Automated review by OpenAI Codex · custom prompt

@github-actions
Copy link

🤖 Claude Code Review

Here is the review:


PR Review: Add proto-array for incremental fork choice

Overall, the architecture is clean and the debug oracle is an excellent safety net. However, there are two correctness bugs in proto_array.rs that could cause head divergence in production release builds.


Bug 1 — Stale best_child when the current best loses votes (proto_array.rs:175–191)

maybe_update_best_child is a monotonic "update if strictly better" function. It never demotes the current best_child. This breaks correctness in the incremental (non-reset) path.

Concrete scenario — anchor has two children A (index 2, weight 2) and B (index 3, weight 1). Validator 0 moves from A to B, producing deltas A=-1, B=+1:

Backward pass:

  • i=3 (B): B.weight = 1+1 = 2. maybe_update_best_child(anchor, B): compares (2, B_root) vs (2, A_root) where A's delta hasn't been applied yet. If A_root > B_root, A stays as best_child.
  • i=2 (A): A.weight = 2-1 = 1. maybe_update_best_child(anchor, A): compares (1, A_root) vs (1, A_root) (current best is still A). A > A is false — no update.

Final state: anchor.best_child = A (weight 1), but B has weight 2. find_head returns A instead of B.

The debug oracle would catch this when triggered but the 27 spec tests likely don't exercise a vote-shift scenario between same-height siblings with A_root > B_root. In release builds the oracle is compiled out.

Fix options:

  • Clear all best_child pointers at the start of apply_score_changes and rebuild them during the backward pass (simple, fully correct).
  • Track a children list per node and rescan on each weight change (heavier but standard in other proto-array implementations like Lighthouse).

Bug 2 — Node weights not zeroed after prune + vote_tracker.reset() (store.rs:808–812)

After prune the surviving nodes retain their pre-pruning weights. After vote_tracker.reset() all old_root slots are None, so the next compute_deltas call treats every current attestation as a fresh +1. The result is:

new_weight = pre_prune_accumulated_weight + current_attestation_count

instead of just current_attestation_count. This inflates weights non-uniformly across nodes (different pre-pruning histories), which can flip relative comparisons and produce a different head than the spec implementation.

Example: after pruning, fork with children C (pre-prune weight=5) and D (pre-prune weight=1). Current attestations: 3 for D, 4 for C. Proto-array gives C=9, D=4 → head=C. Spec gives C=4, D=3 → head=C. Agree here. But with 4 for D, 3 for C: proto-array gives C=8, D=5 → head=C. Spec gives D=4, C=3 → head=D. Divergence.

Fix: Zero all node weights before the first apply_score_changes call after a reset, or add a reset_weights() method to ProtoArray called alongside vote_tracker.reset().


apply_score_changes mutates caller's slice without documenting the side-effect (proto_array.rs:79)

pub fn apply_score_changes(&mut self, deltas: &mut [i64]) {

The function propagates child deltas upward by writing into deltas[parent_idx] in place. This is an invisible side-effect on the caller's data. All current callers discard deltas afterwards, so it's safe in practice, but the doc comment should be explicit. Additionally, the guard parent_idx < deltas.len() on line 90 is always true (parent index < child index < deltas.len()) — the check is dead code and slightly misleading.


slot field is write-only (proto_array.rs:31)

ProtoNode::slot is stored in on_block but never read anywhere. The #[allow(dead_code)] suppresses the warning instead of addressing it. If slot is reserved for future pruning-by-slot, a comment saying so would be clearer.


ForkChoiceState fields are fully public (store.rs:633–635)

pub struct ForkChoiceState {
    pub proto_array: ProtoArray,
    pub vote_tracker: VoteTracker,
}

These are co-dependent state: pruning proto_array without resetting vote_tracker (or vice versa) produces the Bug 2 scenario. Encapsulating mutation behind methods on ForkChoiceState (e.g., prune_and_reset(root), update(attestations)) would prevent accidental misuse as the codebase grows.


signature_spectests.rs creates fresh ForkChoiceState inside the per-block loop (signature_spectests.rs:905)

This is correct for the current test structure (each test body processes exactly one block), but it silently loses any fork-choice state between loop iterations. If the test loop ever grows to process multiple blocks for one test case, the fresh fc would miss the history. A ForkChoiceState initialized once before the loop (like forkchoice_spectests.rs does) would be more robust.


Positive observations

  • The debug oracle pattern (#[cfg(debug_assertions)] asserting proto-array equals spec) is the right approach for validating an optimized fork-choice path — the 27 passing spec tests give meaningful coverage even if they don't exercise every edge case.
  • prune correctly maintains topological ordering invariants (parents always have lower indices than children, which is guaranteed by on_block_core requiring parent state before processing a child block).
  • The from_store sort by slot before insertion correctly establishes the topological order needed by prune and apply_score_changes.
  • Idiomatic use of let … else, inspect_err, and tracing shorthand throughout the diff matches the project conventions in CLAUDE.md.

Summary

Severity Location Issue
Critical proto_array.rs:175 Stale best_child when current best loses votes to a sibling
Critical store.rs:811 Node weights not zeroed after prune + reset()
Minor proto_array.rs:31 slot field is write-only, #[allow(dead_code)] hides the signal
Minor store.rs:633 ForkChoiceState fields should be encapsulated to enforce safe co-mutation
Nit proto_array.rs:90 Dead guard parent_idx < deltas.len() + undocumented slice mutation

Automated review by Claude (Anthropic) · custom prompt

@pablodeymo pablodeymo marked this pull request as ready for review March 18, 2026 21:34
@pablodeymo pablodeymo marked this pull request as draft March 18, 2026 21:34
@pablodeymo pablodeymo marked this pull request as draft March 18, 2026 21:34
@github-actions
Copy link

🤖 Kimi Code Review

Overall Assessment: Well-structured implementation of the proto-array fork choice optimization with good test coverage and a debug oracle for spec compliance. The code is memory-safe and follows Rust idioms. Critical consensus logic appears correct (LMD GHOST with lexicographic tie-breaking, proper weight propagation).

Issues Found:

**1. Lost


Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

@github-actions
Copy link

🤖 Codex Code Review

  1. Finalization handling double-counts all surviving votes. In store.rs, finalization prunes the proto-array and calls vote_tracker.reset(), but proto_array.rs keeps each retained node’s existing weight/best_child. The next update_head then recomputes deltas from an empty tracker and adds every still-known attestation again, so subtree weights are inflated after every finalization. That can skew later fork-choice decisions even if the immediate head stays unchanged. Rebuild ForkChoiceState from store after finalization, or explicitly zero proto-array weights before replaying votes.

  2. Vote removal is missing when a validator disappears from the current attestation snapshot. VoteTracker::compute_deltas only updates validators present in new_attestations (proto_array.rs), but the snapshot comes from bounded FIFO payload buffers (store.rs, store.rs, store.rs). When an entry is evicted, that validator simply vanishes from extract_latest_known_attestations(), so their previous vote stays in self.votes and in accumulated weights forever. Under buffer pressure or long finalization stalls, fork choice will diverge from the spec recomputation. The tracker needs an explicit “missing validator => remove old vote” path, or the incremental state must be rebuilt whenever the source set can shrink.

I could not run the Rust test suite here: cargo test is blocked by the sandboxed environment trying to write under ~/.cargo/~/.rustup and by disabled network access for the git dependency.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@github-actions
Copy link

🤖 Claude Code Review

Now I have enough context for a thorough review.


PR 235 – Proto-Array Incremental Fork Choice: Review

Summary

Good architectural direction. The PR correctly introduces ProtoArray + VoteTracker and wires them through the store/actor boundary cleanly. The debug oracle (#[cfg(debug_assertions)]) is a smart safety net and the 13 unit tests cover the happy path well. However, there are two algorithmic correctness bugs that can produce wrong heads in production.


Critical: best_child is never reset between apply_score_changes calls

File: crates/blockchain/fork_choice/src/proto_array.rs, lines 79–97 and 175–191

maybe_update_best_child can only promote a new winner — it never clears best_child when the current best loses weight. Combined with the fact that the backward pass processes higher-indexed children before lower-indexed ones, a scenario where the current best_child has stale (pre-delta) weight when compared causes the wrong child to persist.

Concrete failing case:

anchor(h(0)) -> a(h(3)) inserted first, index 1
anchor(h(0)) -> b(h(1)) inserted second, index 2

Round 1: 3 votes for a(h(3))  →  best_child = a, weight 3
Round 2: all 3 votes move to b(h(1))  →  deltas: a=−3, b=+3

Backward pass (reverse index order):
  i=2 (b=h(1)): b.weight updated to 3.
    Compare (3, h(1)) vs best_child=a (weight still 3, not yet updated).
    h(1) < h(3) → tie broken in favour of a. best_child stays a.
  i=1 (a=h(3)): a.weight updated to 0.
    Compare (0, h(3)) vs best_child=a (weight now 0). Same node. No update.

Result: best_child = a (weight 0), b (weight 3) is ignored. Wrong head returned.

The fix is to reset all best_child pointers to None at the top of apply_score_changes before iterating, then let the backward pass reconstruct them. This is O(nodes) and fits within the existing loop:

pub fn apply_score_changes(&mut self, deltas: &mut [i64]) {
    // Reset best_child so the pass below always reflects current weights.
    for node in &mut self.nodes {
        node.best_child = None;
    }
    for i in (0..self.nodes.len()).rev() {
        // ... existing weight update and propagation
    }
}

Correctness: weight accumulation after vote_tracker.reset()

File: crates/blockchain/src/store.rs, lines 808–812 (on_block_core) and proto_array.rs line 256

When finalization advances, vote_tracker.reset() clears all old votes but proto_array node weights are not zeroed. On the next compute_deltas call, every validator's old_root is None, so every vote is treated as fresh and adds +1 to the node. The nodes already carry the cumulative weight from before pruning, so votes are effectively double-counted.

Over enough finalization cycles, subtree weights drift arbitrarily far from their true values. The relative ordering may still be correct in most cases (explaining why tests pass), but the oracle comparison will also drift since the spec recomputes from zero each time and the attestation pool after pruning may differ.

Options:

  • Zero all node weights before replaying attestations after a prune+reset, or
  • Don't call vote_tracker.reset() — instead rely on the fact that pruned node indices are remapped and the tracker's stale entries for pruned roots will simply produce no delta (the get_index call returns None). This avoids the re-initialization cost entirely and keeps incremental semantics correct.

Minor issues

Spurious length guards in apply_score_changes (line 81, 90):

if i < deltas.len() { ... }
if i < deltas.len() && parent_idx < deltas.len() { ... }

deltas is always vec![0i64; proto_array.len()] from compute_deltas, so i and parent_idx are always in bounds. The guards are misleading — they suggest the function can be called with an undersized delta slice. Either document the invariant with a debug_assert_eq!(deltas.len(), self.nodes.len()) at the top, or remove the guards.

ForkChoiceState fields are pub (store.rs lines 633–635):

Both proto_array and vote_tracker are exposed. Since ForkChoiceState is used exclusively within the blockchain crate, these could be pub(crate) to avoid unintended external mutation.

ProtoNode carries a slot field that is never used (proto_array.rs line 31):

The field is stored but not read by any algorithm path (#[allow(dead_code)] confirms this). If it's being kept for future use or debugging, a comment saying so would help; otherwise remove it.

signature_spectests.rs creates a fresh ForkChoiceState per block (line 905):

let mut fc = store::ForkChoiceState::from_store(&st);
store::on_tick(&mut st, &mut fc, block_time_ms, true, false);
let result = store::on_block(&mut st, &mut fc, signed_block, &[]);

Since fc is rebuilt from scratch before each block, the incremental benefit is lost in these tests and the proto-array isn't being exercised in a realistic accumulating-state scenario. This is fine for a signature test (correctness of the block pipeline is the goal), but consider whether a test that holds fc across multiple blocks would be useful for catching the weight accumulation issue above.

prune relies on topological ordering but doesn't enforce it (lines 127–133):

The algorithm assumes children always have higher indices than their parents. This holds as long as callers insert blocks in slot order (which from_store enforces via sort_by_key), but on_block itself has no guard. A debug_assert or inline comment stating the invariant would reduce risk.


Tests

The 13 unit tests are well-structured and cover the core cases. Two gaps worth adding:

  • A test that exercises weight decrease (votes moving away from the current best_child) — this would directly catch the stale-best_child bug described above.
  • A test that runs prune + continued voting without calling vt.reset(), to validate the non-reset path once it's adopted as the fix.

Automated review by Claude (Anthropic) · sonnet · custom prompt

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 18, 2026

Greptile Summary

This PR replaces the O(validators × chain_depth) compute_lmd_ghost_head call in update_head with an incremental proto-array data structure (ProtoArray + VoteTracker) that propagates weight deltas in a single backward O(nodes) pass and resolves the head in O(depth). The architecture is clean: ForkChoiceState is owned by BlockChainServer, initialised from the live chain at startup, updated on every block and tick, and pruned on finalization. A #[cfg(debug_assertions)] oracle runs both implementations in parallel and asserts equality on every update_head invocation.

Two correctness bugs are present in the core algorithm:

  • Stale weight comparison in maybe_update_best_child (proto_array.rs:155–171): during the backward pass, a challenger child (higher index, already updated) is compared against the current best_child's pre-update weight. When the challenger's new weight exceeds the best_child's new weight but not its old weight, and the root-hash tiebreak favours the current best_child, the head pointer is not updated — returning the wrong fork head. No existing test covers the case where the lower-indexed sibling has the higher root hash.
  • Weight double-counting after prune+reset (store.rs:667–672): prune() clones surviving nodes with their accumulated subtree weights, then vote_tracker.reset() discards all vote history. The next compute_deltas call treats every current attestation as a new vote, adding fresh deltas on top of already-accumulated weights. As long as vote distribution is uniform this does not change the head selection, but divergent vote distributions (especially as validators change targets across finalization epochs) can corrupt relative weights and eventually cause the debug oracle assertion to fire in debug builds or silent head misselection in release builds.
  • Zero-weight nodes silently become best_child (proto_array.rs:59–77): maybe_update_best_child sets best_child = Some(i) whenever the parent has no prior best child, regardless of whether i has any votes. A vote for a non-leaf node will cause find_head to walk past that node into a zero-weight child. Rare in production (validators always vote for the tip), but reachable during startup with stale attestations.

Confidence Score: 2/5

  • Not safe to merge — two algorithmic correctness bugs in the proto-array core can produce wrong fork choice heads under specific but realistic vote-weight shift and finalization scenarios.
  • The architecture and integration wiring are solid, and the debug oracle is an excellent safeguard. However, the stale-weight comparison bug in maybe_update_best_child is a deterministic logic error that will produce incorrect head selection when two siblings have the same weight and their root hashes break the tie in a specific order. The prune+reset weight double-counting introduces persistent state corruption after every finalization epoch that compounds over time. Both bugs are masked by the test suite's choice of monotonically increasing root hashes (h(1) < h(2) always) and the absence of multi-finalization-epoch test scenarios.
  • crates/blockchain/fork_choice/src/proto_array.rs (both core algorithmic bugs) and crates/blockchain/src/store.rs (prune+reset double-counting).

Important Files Changed

Filename Overview
crates/blockchain/fork_choice/src/proto_array.rs New ProtoArray + VoteTracker implementation; contains a stale-weight comparison bug in maybe_update_best_child that can select the wrong head, and a zero-weight node guard is missing.
crates/blockchain/src/store.rs Integrates ForkChoiceState into all store functions; prune+reset pattern after finalization preserves node weights while clearing vote history, causing double-counted weights on the next update.
crates/blockchain/fork_choice/src/lib.rs Trivial change: adds mod proto_array and re-exports ProtoArray + VoteTracker; no issues.
crates/blockchain/src/lib.rs Wires ForkChoiceState into BlockChainServer and threads it through on_tick, on_block, and produce_block_with_signatures; mechanical changes, no logic issues.
crates/blockchain/tests/forkchoice_spectests.rs Updated spec tests to thread ForkChoiceState through correctly; fc is created once per test case and reused across all steps as intended.
crates/blockchain/tests/signature_spectests.rs Creates a fresh ForkChoiceState for each independent single-block test case, which is correct since each test has its own isolated store.

Sequence Diagram

sequenceDiagram
    participant BCS as BlockChainServer
    participant Store
    participant FCS as ForkChoiceState
    participant PA as ProtoArray
    participant VT as VoteTracker

    note over BCS: Startup
    BCS->>Store: get_live_chain()
    Store-->>BCS: blocks (sorted by slot)
    BCS->>PA: on_block(root, parent, slot) ×N
    BCS->>Store: extract_latest_known_attestations()
    Store-->>BCS: attestations
    BCS->>VT: compute_deltas(attestations, pa)
    VT-->>BCS: deltas[]
    BCS->>PA: apply_score_changes(deltas)
    note over BCS: ForkChoiceState ready

    note over BCS: New block arrives (on_block_core)
    BCS->>Store: state_transition(block)
    Store-->>BCS: post_state (justified, finalized)
    alt finalization advanced
        BCS->>PA: prune(finalized.root)
        BCS->>VT: reset()
    end
    BCS->>PA: on_block(block_root, parent, slot)
    BCS->>FCS: update_head(store, fc, false)
    FCS->>Store: extract_latest_known_attestations()
    Store-->>FCS: attestations
    FCS->>VT: compute_deltas(attestations, pa)
    VT-->>FCS: deltas[]
    FCS->>PA: apply_score_changes(deltas)
    PA-->>FCS: (weights + best_child updated)
    FCS->>PA: find_head(justified_root)
    PA-->>FCS: new_head
    FCS->>Store: update_checkpoints(head_only(new_head))

    note over BCS: Slot tick (interval 0/4)
    BCS->>Store: promote_new_aggregated_payloads()
    BCS->>FCS: update_head(store, fc, log_tree)
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: crates/blockchain/fork_choice/src/proto_array.rs
Line: 155-171

Comment:
**Stale weight comparison in `maybe_update_best_child` can select incorrect head**

`maybe_update_best_child` compares the challenger child's *updated* weight against the current `best_child`'s *stale* (pre-update) weight. Because `apply_score_changes` iterates backward (highest index first), a sibling with a **lower** index (the current `best_child`) has not yet had its delta applied at the time the comparison is made.

Concrete failure scenario:
- `root(0) -> a(idx=1, root=h(170)), b(idx=2, root=h(100))` — both children of root
- Initial weights: `a=10`, `b=5`, `best_child[root]=a`
- Deltas this round: `a=-8`, `b=+5`
- **Expected after update**: `a.weight=2`, `b.weight=10``b` should become `best_child`

Trace through `apply_score_changes`:
1. Process `b` (idx=2): `b.weight = 10`. Call `maybe_update_best_child(root, b)`. Compares `(10, h(100))` vs `a`'s **stale** `(10, h(170))`. Since `(10, h(100)) < (10, h(170))`, `a` wins. `best_child[root]` stays `a`.
2. Process `a` (idx=1): `a.weight = 2`. Call `maybe_update_best_child(root, a)`. `best_child` is still `a`, so it compares `a` against **itself** — a no-op.

Final result: `best_child[root] = a` (weight 2) even though `b.weight = 10`. The head is wrong.

The root cause is that when the lower-indexed sibling (`a`) is compared in step 2, it's compared against itself rather than against `b`. The fix requires either comparing all children after all weights are updated, or using a second pass to recompute `best_child` pointers.

None of the existing tests catch this because they use `h(1) < h(2)` (byte-ordered), so the higher-rooted node always wins. A test with the same weights but reversed root hash ordering would expose it.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: crates/blockchain/src/store.rs
Line: 667-672

Comment:
**Pruning preserves node weights but resets vote history, causing weight double-counting**

After `fc.proto_array.prune(finalized.root)`, the surviving nodes **retain their accumulated subtree weights** (via `node.clone()` in `prune`). Then `fc.vote_tracker.reset()` clears all vote history. On the very next `update_head` call, `compute_deltas` treats every existing attestation as a **brand-new vote** (no `old_root` to subtract), adding +1 per validator for their current target. This inflates all surviving node weights by the current vote count on top of the already-accumulated historical weights.

Example:
- After finalization, `C1.weight = 3` (3 validators voted), `C2.weight = 1`
- prune + reset
- Next `compute_deltas` with the same 3+1 validators: `C1` gets delta +3, `C2` gets +1
- `apply_score_changes`: `C1.weight = 6`, `C2.weight = 2`

If the distribution is uniform, head selection is unaffected (relative ordering preserved). But if validators subsequently shift votes, the inflated baseline distorts relative weights:

- Next round: 1 validator moves from `C1` to `C2`
- Proto: `C1.weight = 5`, `C2.weight = 3``C1` wins
- Spec: `C1` has 2 votes, `C2` has 2 votes → **tied, tiebreak by root**

Depending on root hashes, proto-array and spec may disagree, triggering the debug oracle assertion in `debug_assertions` builds. In release builds the corruption silently persists, compounding on each finalization.

The fix is to also reset each surviving node's weight to 0 after pruning (or rebuild from `from_store`), so the weight reflects only post-prune votes.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: crates/blockchain/fork_choice/src/proto_array.rs
Line: 59-77

Comment:
**Zero-weight nodes can be set as `best_child`, pulling head past the voted block**

`apply_score_changes` calls `maybe_update_best_child(parent, i)` for **every** node in the backward pass, including nodes whose net weight remains 0. Because `maybe_update_best_child` unconditionally sets `best_child = Some(child_idx)` when the parent has no prior `best_child` (`None => true`), a zero-weight leaf can become the `best_child` of an ancestor even when the ancestor itself has positive votes.

Minimal reproducer:

```rust
// chain: root(0) -> a(1) -> b(2)
// validator votes for a (the middle node), not the leaf b
pa.on_block(root, ZERO, 0);
pa.on_block(a, root, 1);
pa.on_block(b, a, 2);   // b has no votes

let mut att = HashMap::new();
att.insert(0, make_attestation(a, 1));   // vote for a, not b
let mut deltas = vt.compute_deltas(&att, &pa);
pa.apply_score_changes(&mut deltas);

// deltas = [0, +1, 0]; when processing b (idx=2, weight=0):
//   a has no best_child → best_child[a] = b  (weight 0!)
// find_head(root) → root → a → b  (WRONG; should stop at a)
assert_eq!(pa.find_head(root), a);  // fails: returns b
```

In practice Ethereum validators always vote for the chain tip, so this is unlikely to trigger with real attestations. It would however be exposed if the proto-array is initialised via `from_store` with stale attestations that don't reach the current tip, or during startup with a partially-synced chain. Adding a guard `if child_weight > 0` (or `>= 0` with a more refined viability check) in `maybe_update_best_child` would prevent zero-weight nodes from becoming `best_child`.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: crates/blockchain/src/store.rs
Line: 88-114

Comment:
**Debug oracle re-uses `attestations` after it was consumed by `compute_deltas`**

`extract_latest_known_attestations()` is called once (line 89) and the result is passed to both `compute_deltas` (line 95) and the debug oracle's `compute_lmd_ghost_head` (line 103–108). `compute_deltas` takes `&HashMap<…>` so the value is not moved, but both calls **see the same snapshot**. This is intentional and correct.

However, the oracle also calls `store.get_live_chain()` (line 102) **inside** the `debug_assertions` block, meaning it is only evaluated in debug builds. If any future refactor moves the `get_live_chain()` call to before `store.update_checkpoints(ForkCheckpoints::head_only(new_head))` at line 122, the live chain would include the new head set by the proto-array result, creating a circular dependency. Consider adding a comment that the oracle must use the chain state **before** `update_checkpoints` to remain valid.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: "Merge branch 'main' ..."

Comment on lines +155 to +171
fn maybe_update_best_child(&mut self, parent_idx: usize, child_idx: usize) {
let child_weight = self.nodes[child_idx].weight;
let child_root = self.nodes[child_idx].root;

let dominated = match self.nodes[parent_idx].best_child {
None => true,
Some(current_best) => {
let best_weight = self.nodes[current_best].weight;
let best_root = self.nodes[current_best].root;
(child_weight, child_root) > (best_weight, best_root)
}
};

if dominated {
self.nodes[parent_idx].best_child = Some(child_idx);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Stale weight comparison in maybe_update_best_child can select incorrect head

maybe_update_best_child compares the challenger child's updated weight against the current best_child's stale (pre-update) weight. Because apply_score_changes iterates backward (highest index first), a sibling with a lower index (the current best_child) has not yet had its delta applied at the time the comparison is made.

Concrete failure scenario:

  • root(0) -> a(idx=1, root=h(170)), b(idx=2, root=h(100)) — both children of root
  • Initial weights: a=10, b=5, best_child[root]=a
  • Deltas this round: a=-8, b=+5
  • Expected after update: a.weight=2, b.weight=10b should become best_child

Trace through apply_score_changes:

  1. Process b (idx=2): b.weight = 10. Call maybe_update_best_child(root, b). Compares (10, h(100)) vs a's stale (10, h(170)). Since (10, h(100)) < (10, h(170)), a wins. best_child[root] stays a.
  2. Process a (idx=1): a.weight = 2. Call maybe_update_best_child(root, a). best_child is still a, so it compares a against itself — a no-op.

Final result: best_child[root] = a (weight 2) even though b.weight = 10. The head is wrong.

The root cause is that when the lower-indexed sibling (a) is compared in step 2, it's compared against itself rather than against b. The fix requires either comparing all children after all weights are updated, or using a second pass to recompute best_child pointers.

None of the existing tests catch this because they use h(1) < h(2) (byte-ordered), so the higher-rooted node always wins. A test with the same weights but reversed root hash ordering would expose it.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/fork_choice/src/proto_array.rs
Line: 155-171

Comment:
**Stale weight comparison in `maybe_update_best_child` can select incorrect head**

`maybe_update_best_child` compares the challenger child's *updated* weight against the current `best_child`'s *stale* (pre-update) weight. Because `apply_score_changes` iterates backward (highest index first), a sibling with a **lower** index (the current `best_child`) has not yet had its delta applied at the time the comparison is made.

Concrete failure scenario:
- `root(0) -> a(idx=1, root=h(170)), b(idx=2, root=h(100))` — both children of root
- Initial weights: `a=10`, `b=5`, `best_child[root]=a`
- Deltas this round: `a=-8`, `b=+5`
- **Expected after update**: `a.weight=2`, `b.weight=10``b` should become `best_child`

Trace through `apply_score_changes`:
1. Process `b` (idx=2): `b.weight = 10`. Call `maybe_update_best_child(root, b)`. Compares `(10, h(100))` vs `a`'s **stale** `(10, h(170))`. Since `(10, h(100)) < (10, h(170))`, `a` wins. `best_child[root]` stays `a`.
2. Process `a` (idx=1): `a.weight = 2`. Call `maybe_update_best_child(root, a)`. `best_child` is still `a`, so it compares `a` against **itself** — a no-op.

Final result: `best_child[root] = a` (weight 2) even though `b.weight = 10`. The head is wrong.

The root cause is that when the lower-indexed sibling (`a`) is compared in step 2, it's compared against itself rather than against `b`. The fix requires either comparing all children after all weights are updated, or using a second pass to recompute `best_child` pointers.

None of the existing tests catch this because they use `h(1) < h(2)` (byte-ordered), so the higher-rooted node always wins. A test with the same weights but reversed root hash ordering would expose it.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +667 to +672
// Prune proto-array on finalization advance and reset vote tracker
// (must happen after store.update_checkpoints which prunes storage)
if let Some(finalized) = finalized {
fc.proto_array.prune(finalized.root);
fc.vote_tracker.reset();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Pruning preserves node weights but resets vote history, causing weight double-counting

After fc.proto_array.prune(finalized.root), the surviving nodes retain their accumulated subtree weights (via node.clone() in prune). Then fc.vote_tracker.reset() clears all vote history. On the very next update_head call, compute_deltas treats every existing attestation as a brand-new vote (no old_root to subtract), adding +1 per validator for their current target. This inflates all surviving node weights by the current vote count on top of the already-accumulated historical weights.

Example:

  • After finalization, C1.weight = 3 (3 validators voted), C2.weight = 1
  • prune + reset
  • Next compute_deltas with the same 3+1 validators: C1 gets delta +3, C2 gets +1
  • apply_score_changes: C1.weight = 6, C2.weight = 2

If the distribution is uniform, head selection is unaffected (relative ordering preserved). But if validators subsequently shift votes, the inflated baseline distorts relative weights:

  • Next round: 1 validator moves from C1 to C2
  • Proto: C1.weight = 5, C2.weight = 3C1 wins
  • Spec: C1 has 2 votes, C2 has 2 votes → tied, tiebreak by root

Depending on root hashes, proto-array and spec may disagree, triggering the debug oracle assertion in debug_assertions builds. In release builds the corruption silently persists, compounding on each finalization.

The fix is to also reset each surviving node's weight to 0 after pruning (or rebuild from from_store), so the weight reflects only post-prune votes.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/src/store.rs
Line: 667-672

Comment:
**Pruning preserves node weights but resets vote history, causing weight double-counting**

After `fc.proto_array.prune(finalized.root)`, the surviving nodes **retain their accumulated subtree weights** (via `node.clone()` in `prune`). Then `fc.vote_tracker.reset()` clears all vote history. On the very next `update_head` call, `compute_deltas` treats every existing attestation as a **brand-new vote** (no `old_root` to subtract), adding +1 per validator for their current target. This inflates all surviving node weights by the current vote count on top of the already-accumulated historical weights.

Example:
- After finalization, `C1.weight = 3` (3 validators voted), `C2.weight = 1`
- prune + reset
- Next `compute_deltas` with the same 3+1 validators: `C1` gets delta +3, `C2` gets +1
- `apply_score_changes`: `C1.weight = 6`, `C2.weight = 2`

If the distribution is uniform, head selection is unaffected (relative ordering preserved). But if validators subsequently shift votes, the inflated baseline distorts relative weights:

- Next round: 1 validator moves from `C1` to `C2`
- Proto: `C1.weight = 5`, `C2.weight = 3``C1` wins
- Spec: `C1` has 2 votes, `C2` has 2 votes → **tied, tiebreak by root**

Depending on root hashes, proto-array and spec may disagree, triggering the debug oracle assertion in `debug_assertions` builds. In release builds the corruption silently persists, compounding on each finalization.

The fix is to also reset each surviving node's weight to 0 after pruning (or rebuild from `from_store`), so the weight reflects only post-prune votes.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +59 to +77
pub fn apply_score_changes(&mut self, deltas: &mut [i64]) {
for i in (0..self.nodes.len()).rev() {
if i < deltas.len() {
self.nodes[i].weight += deltas[i];
}

let Some(parent_idx) = self.nodes[i].parent else {
continue;
};

// Propagate this node's delta to parent
if i < deltas.len() && parent_idx < deltas.len() {
deltas[parent_idx] += deltas[i];
}

// Update best_child: pick the child with highest weight, tiebreak by root hash
self.maybe_update_best_child(parent_idx, i);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Zero-weight nodes can be set as best_child, pulling head past the voted block

apply_score_changes calls maybe_update_best_child(parent, i) for every node in the backward pass, including nodes whose net weight remains 0. Because maybe_update_best_child unconditionally sets best_child = Some(child_idx) when the parent has no prior best_child (None => true), a zero-weight leaf can become the best_child of an ancestor even when the ancestor itself has positive votes.

Minimal reproducer:

// chain: root(0) -> a(1) -> b(2)
// validator votes for a (the middle node), not the leaf b
pa.on_block(root, ZERO, 0);
pa.on_block(a, root, 1);
pa.on_block(b, a, 2);   // b has no votes

let mut att = HashMap::new();
att.insert(0, make_attestation(a, 1));   // vote for a, not b
let mut deltas = vt.compute_deltas(&att, &pa);
pa.apply_score_changes(&mut deltas);

// deltas = [0, +1, 0]; when processing b (idx=2, weight=0):
//   a has no best_child → best_child[a] = b  (weight 0!)
// find_head(root) → root → a → b  (WRONG; should stop at a)
assert_eq!(pa.find_head(root), a);  // fails: returns b

In practice Ethereum validators always vote for the chain tip, so this is unlikely to trigger with real attestations. It would however be exposed if the proto-array is initialised via from_store with stale attestations that don't reach the current tip, or during startup with a partially-synced chain. Adding a guard if child_weight > 0 (or >= 0 with a more refined viability check) in maybe_update_best_child would prevent zero-weight nodes from becoming best_child.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/fork_choice/src/proto_array.rs
Line: 59-77

Comment:
**Zero-weight nodes can be set as `best_child`, pulling head past the voted block**

`apply_score_changes` calls `maybe_update_best_child(parent, i)` for **every** node in the backward pass, including nodes whose net weight remains 0. Because `maybe_update_best_child` unconditionally sets `best_child = Some(child_idx)` when the parent has no prior `best_child` (`None => true`), a zero-weight leaf can become the `best_child` of an ancestor even when the ancestor itself has positive votes.

Minimal reproducer:

```rust
// chain: root(0) -> a(1) -> b(2)
// validator votes for a (the middle node), not the leaf b
pa.on_block(root, ZERO, 0);
pa.on_block(a, root, 1);
pa.on_block(b, a, 2);   // b has no votes

let mut att = HashMap::new();
att.insert(0, make_attestation(a, 1));   // vote for a, not b
let mut deltas = vt.compute_deltas(&att, &pa);
pa.apply_score_changes(&mut deltas);

// deltas = [0, +1, 0]; when processing b (idx=2, weight=0):
//   a has no best_child → best_child[a] = b  (weight 0!)
// find_head(root) → root → a → b  (WRONG; should stop at a)
assert_eq!(pa.find_head(root), a);  // fails: returns b
```

In practice Ethereum validators always vote for the chain tip, so this is unlikely to trigger with real attestations. It would however be exposed if the proto-array is initialised via `from_store` with stale attestations that don't reach the current tip, or during startup with a partially-synced chain. Adding a guard `if child_weight > 0` (or `>= 0` with a more refined viability check) in `maybe_update_best_child` would prevent zero-weight nodes from becoming `best_child`.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +88 to +114
fn update_head(store: &mut Store, fc: &mut ForkChoiceState, log_tree: bool) {
let attestations = store.extract_latest_known_attestations();
let justified_root = store.latest_justified().root;

// Incremental fork choice via proto-array
let mut deltas = fc
.vote_tracker
.compute_deltas(&attestations, &fc.proto_array);
fc.proto_array.apply_score_changes(&mut deltas);
let new_head = fc.proto_array.find_head(justified_root);

// Debug oracle: verify proto-array matches spec implementation
#[cfg(debug_assertions)]
{
let blocks = store.get_live_chain();
let (spec_head, _) = ethlambda_fork_choice::compute_lmd_ghost_head(
justified_root,
&blocks,
&attestations,
0,
);
assert_eq!(
new_head, spec_head,
"proto-array diverged from spec: proto={:?} spec={:?}",
new_head, spec_head
);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Debug oracle re-uses attestations after it was consumed by compute_deltas

extract_latest_known_attestations() is called once (line 89) and the result is passed to both compute_deltas (line 95) and the debug oracle's compute_lmd_ghost_head (line 103–108). compute_deltas takes &HashMap<…> so the value is not moved, but both calls see the same snapshot. This is intentional and correct.

However, the oracle also calls store.get_live_chain() (line 102) inside the debug_assertions block, meaning it is only evaluated in debug builds. If any future refactor moves the get_live_chain() call to before store.update_checkpoints(ForkCheckpoints::head_only(new_head)) at line 122, the live chain would include the new head set by the proto-array result, creating a circular dependency. Consider adding a comment that the oracle must use the chain state before update_checkpoints to remain valid.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/src/store.rs
Line: 88-114

Comment:
**Debug oracle re-uses `attestations` after it was consumed by `compute_deltas`**

`extract_latest_known_attestations()` is called once (line 89) and the result is passed to both `compute_deltas` (line 95) and the debug oracle's `compute_lmd_ghost_head` (line 103–108). `compute_deltas` takes `&HashMap<…>` so the value is not moved, but both calls **see the same snapshot**. This is intentional and correct.

However, the oracle also calls `store.get_live_chain()` (line 102) **inside** the `debug_assertions` block, meaning it is only evaluated in debug builds. If any future refactor moves the `get_live_chain()` call to before `store.update_checkpoints(ForkCheckpoints::head_only(new_head))` at line 122, the live chain would include the new head set by the proto-array result, creating a circular dependency. Consider adding a comment that the oracle must use the chain state **before** `update_checkpoints` to remain valid.

How can I resolve this? If you propose a fix, please make it concise.

let mut proto_array = ProtoArray::new();

// Load all blocks from live chain and insert in topological order (by slot)
let blocks = store.get_live_chain();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove the livechain

}
3 => {
// Update safe target for validators
// Update safe target for validators (uses spec implementation, not proto-array)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use proto-array here too?

Comment on lines +93 to +96
let mut deltas = fc
.vote_tracker
.compute_deltas(&attestations, &fc.proto_array);
fc.proto_array.apply_score_changes(&mut deltas);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be a single function?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants