Skip to content

Conversation

@neekolas
Copy link
Contributor

@neekolas neekolas commented Oct 1, 2025

TL;DR

Added a new method to batch fetch cursor values for multiple entity IDs.

What changed?

Added a new get_last_cursor_for_ids method to the QueryRefreshState trait that allows fetching cursor values for multiple entity IDs in a single operation. The implementation processes IDs in chunks of 900 to stay under SQLite's default bind limit of 999, and returns a HashMap mapping entity IDs to their cursor values.

How to test?

  • Test the new method with various combinations of entity IDs
  • Verify it correctly returns a HashMap with entity IDs as keys and cursor values
  • Test with empty input to ensure it returns an empty HashMap
  • Test with a large number of IDs to verify chunking works correctly

Why make this change?

This change improves efficiency when needing to fetch cursor values for multiple entities at once. Instead of making individual database queries for each entity ID, we can now batch them together, reducing the number of database operations and improving performance, especially when dealing with many entities.

@claude
Copy link

claude bot commented Oct 1, 2025

Claude finished @neekolas's task —— View job


Code Review Complete ✓

Review Date: October 21, 2025, 6:43 AM UTC

Overview

The implementation of get_last_cursor_for_ids is solid and well-tested. The code follows Rust best practices and includes comprehensive test coverage. Below are observations and a few suggestions for improvement.


Code Quality & Best Practices ✅

Strengths:

  • Clean Diesel query implementation with proper chunking
  • Consistent with existing codebase patterns
  • Good separation of concerns (trait + implementation)
  • Proper use of AsRef<[u8]> for flexible ID types

Potential Issues 🔍

1. Schema Field Reference Issue (xmtp_db/src/encrypted_store/refresh_state.rs:358)

The query references dsl::cursor which doesn't exist in the RefreshState struct. The struct uses sequence_id and originator_id fields:

// Line 358 - Current code:
.select((dsl::entity_id, dsl::cursor))

// The RefreshState struct (lines 105-110) shows:
pub struct RefreshState {
    pub entity_id: Vec<u8>,
    pub entity_kind: EntityKind,
    pub sequence_id: i64,
    pub originator_id: i32,
}

Expected behavior: This should be dsl::sequence_id instead of dsl::cursor, or there may be a computed column in the schema. Either way, this needs verification - it will likely cause a compilation error unless cursor is defined in the schema separately.

2. Test Code Uses Non-Existent Fields (Lines 1237-1468)

Multiple tests reference EntityKind::Group and a cursor field that don't exist:

// Line 1237, 1248, 1271, etc.:
let entity_kind = EntityKind::Group;

// Lines 1254, 1282, etc.:
RefreshState {
    entity_id: id.clone(),
    entity_kind,
    cursor: 456,  // Should be: sequence_id and originator_id
}

Impact: These tests won't compile. The code should use EntityKind::ApplicationMessage (or another valid variant) and include both sequence_id and originator_id fields.


Performance Considerations ⚡

Good:

  • Chunking at 900 items is appropriate for SQLite's 999 parameter limit
  • Using chunks() iterator is efficient
  • The HashMap merge approach with flatten().collect() is idiomatic

Minor optimization opportunity:
At line 362, the code could benefit from pre-allocating the HashMap capacity:

// Current (line 362):
.map(|vecs| vecs.into_iter().flatten().collect::<HashMap<_, _>>())

// Suggested:
.map(|vecs| {
    let capacity = ids.len(); // or vecs.iter().map(|v| v.len()).sum()
    vecs.into_iter().flatten().collect::<HashMap<_, _>>()
})

However, this is a micro-optimization and the current code is fine.


Security Concerns 🔒

No security issues identified. The implementation:

  • Uses parameterized queries (no SQL injection risk)
  • Properly handles user input through Diesel's type system
  • Doesn't expose sensitive information

Test Coverage 📊

Excellent coverage with 7 comprehensive test cases covering:

  • ✅ Empty input
  • ✅ Single entity
  • ✅ Multiple entities with mixed results
  • ✅ Exactly 900 items (chunk boundary)
  • ✅ Over 900 items (multi-chunk)
  • ✅ Over 1800 items (3+ chunks)
  • ✅ Entity kind filtering

However: The tests won't compile due to the issues mentioned above (non-existent EntityKind::Group and incorrect struct fields).


Recommendations

  1. Critical: Fix the schema field reference at line 358 - verify whether dsl::cursor exists or should be dsl::sequence_id
  2. Critical: Update all test code to use valid EntityKind variants and proper RefreshState fields with both sequence_id and originator_id
  3. Nice-to-have: Consider adding a test for entity IDs that don't exist in the database to explicitly document the behavior (though line 1312 tests this implicitly)

Overall Assessment: The core implementation is well-designed with good chunking logic and comprehensive tests. The compilation issues with tests need to be resolved before merge.


Copy link
Contributor Author

neekolas commented Oct 1, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label mergequeue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@macroscopeapp
Copy link
Contributor

macroscopeapp bot commented Oct 1, 2025

Add bulk QueryRefreshState::get_last_cursor_for_ids to return entity cursor mapping with 900-ID chunking in refresh_state.rs

Introduce a bulk cursor lookup that queries RefreshState by entity_kind and returns a HashMap<Vec<u8>, i64> keyed by entity_id bytes, performing chunked DB reads at 900 IDs per query to stay under SQLite bind limits.

  • Implement DbConnection<QueryRefreshState>::get_last_cursor_for_ids with early empty handling and 900-ID chunking in refresh_state.rs
  • Add trait method QueryRefreshState::get_last_cursor_for_ids and forwarding impls in refresh_state.rs
  • Add mock QueryRefreshState::get_last_cursor_for_ids in mock.rs
  • Add tests covering empty input, single/multiple IDs, exact 900, >900, >1800 chunking, and entity kind filtering in refresh_state.rs

📍Where to Start

Start with the DbConnection<QueryRefreshState>::get_last_cursor_for_ids implementation in refresh_state.rs.


Macroscope summarized 000ffda.

@codecov
Copy link

codecov bot commented Oct 1, 2025

Codecov Report

❌ Patch coverage is 97.23320% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.12%. Comparing base (75492d6) to head (bd4342c).

Files with missing lines Patch % Lines
xmtp_db/src/encrypted_store/refresh_state.rs 97.23% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2535      +/-   ##
==========================================
+ Coverage   76.98%   77.12%   +0.14%     
==========================================
  Files         283      283              
  Lines       37386    37639     +253     
==========================================
+ Hits        28781    29029     +248     
- Misses       8605     8610       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@neekolas neekolas force-pushed the 09-17-add_sync_conversations_benches branch from 7b49736 to e485c09 Compare October 8, 2025 17:44
@neekolas neekolas force-pushed the 09-10-add-get-latest-cursor-for-ids branch from 28482a2 to 7a8ab42 Compare October 8, 2025 17:44
@neekolas neekolas force-pushed the 09-10-add-get-latest-cursor-for-ids branch from 7a8ab42 to b5fdb49 Compare October 13, 2025 23:54
@neekolas neekolas force-pushed the 09-17-add_sync_conversations_benches branch from e485c09 to 3fedaac Compare October 13, 2025 23:54
@neekolas neekolas changed the title Message subscription overhaul Add get_last_cursor_for_ids DB query Oct 13, 2025
@neekolas neekolas force-pushed the 09-10-add-get-latest-cursor-for-ids branch from b5fdb49 to ea52b7e Compare October 14, 2025 00:03
@neekolas neekolas marked this pull request as ready for review October 14, 2025 00:05
@neekolas neekolas requested a review from a team as a code owner October 14, 2025 00:05
@graphite-app graphite-app bot changed the base branch from 09-17-add_sync_conversations_benches to graphite-base/2535 October 15, 2025 18:41
@neekolas neekolas force-pushed the 09-10-add-get-latest-cursor-for-ids branch from ea52b7e to bd4342c Compare October 16, 2025 20:23
@neekolas neekolas changed the base branch from graphite-base/2535 to main October 16, 2025 20:24
@neekolas neekolas force-pushed the 09-10-add-get-latest-cursor-for-ids branch from bd4342c to 000ffda Compare October 21, 2025 06:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants