Add get_last_cursor_for_ids DB query #2535

neekolas · 2025-10-01T01:01:33Z

TL;DR

Added a new method to batch fetch cursor values for multiple entity IDs.

What changed?

Added a new get_last_cursor_for_ids method to the QueryRefreshState trait that allows fetching cursor values for multiple entity IDs in a single operation. The implementation processes IDs in chunks of 900 to stay under SQLite's default bind limit of 999, and returns a HashMap mapping entity IDs to their cursor values.

How to test?

Test the new method with various combinations of entity IDs
Verify it correctly returns a HashMap with entity IDs as keys and cursor values
Test with empty input to ensure it returns an empty HashMap
Test with a large number of IDs to verify chunking works correctly

Why make this change?

This change improves efficiency when needing to fetch cursor values for multiple entities at once. Instead of making individual database queries for each entity ID, we can now batch them together, reducing the number of database operations and improving performance, especially when dealing with many entities.

claude · 2025-10-01T01:01:48Z

Claude finished @neekolas's task —— View job

Code Review Complete ✓

Review Date: October 21, 2025, 6:43 AM UTC

Overview

The implementation of get_last_cursor_for_ids is solid and well-tested. The code follows Rust best practices and includes comprehensive test coverage. Below are observations and a few suggestions for improvement.

Code Quality & Best Practices ✅

Strengths:

Clean Diesel query implementation with proper chunking
Consistent with existing codebase patterns
Good separation of concerns (trait + implementation)
Proper use of AsRef<[u8]> for flexible ID types

Potential Issues 🔍

1. Schema Field Reference Issue (xmtp_db/src/encrypted_store/refresh_state.rs:358)

The query references dsl::cursor which doesn't exist in the RefreshState struct. The struct uses sequence_id and originator_id fields:

// Line 358 - Current code:
.select((dsl::entity_id, dsl::cursor))

// The RefreshState struct (lines 105-110) shows:
pub struct RefreshState {
    pub entity_id: Vec<u8>,
    pub entity_kind: EntityKind,
    pub sequence_id: i64,
    pub originator_id: i32,
}

Expected behavior: This should be dsl::sequence_id instead of dsl::cursor, or there may be a computed column in the schema. Either way, this needs verification - it will likely cause a compilation error unless cursor is defined in the schema separately.

2. Test Code Uses Non-Existent Fields (Lines 1237-1468)

Multiple tests reference EntityKind::Group and a cursor field that don't exist:

// Line 1237, 1248, 1271, etc.:
let entity_kind = EntityKind::Group;

// Lines 1254, 1282, etc.:
RefreshState {
    entity_id: id.clone(),
    entity_kind,
    cursor: 456,  // Should be: sequence_id and originator_id
}

Impact: These tests won't compile. The code should use EntityKind::ApplicationMessage (or another valid variant) and include both sequence_id and originator_id fields.

Performance Considerations ⚡

Good:

Chunking at 900 items is appropriate for SQLite's 999 parameter limit
Using chunks() iterator is efficient
The HashMap merge approach with flatten().collect() is idiomatic

Minor optimization opportunity:
At line 362, the code could benefit from pre-allocating the HashMap capacity:

// Current (line 362):
.map(|vecs| vecs.into_iter().flatten().collect::<HashMap<_, _>>())

// Suggested:
.map(|vecs| {
    let capacity = ids.len(); // or vecs.iter().map(|v| v.len()).sum()
    vecs.into_iter().flatten().collect::<HashMap<_, _>>()
})

However, this is a micro-optimization and the current code is fine.

Security Concerns 🔒

No security issues identified. The implementation:

Uses parameterized queries (no SQL injection risk)
Properly handles user input through Diesel's type system
Doesn't expose sensitive information

Test Coverage 📊

Excellent coverage with 7 comprehensive test cases covering:

✅ Empty input
✅ Single entity
✅ Multiple entities with mixed results
✅ Exactly 900 items (chunk boundary)
✅ Over 900 items (multi-chunk)
✅ Over 1800 items (3+ chunks)
✅ Entity kind filtering

However: The tests won't compile due to the issues mentioned above (non-existent EntityKind::Group and incorrect struct fields).

Recommendations

Critical: Fix the schema field reference at line 358 - verify whether dsl::cursor exists or should be dsl::sequence_id
Critical: Update all test code to use valid EntityKind variants and proper RefreshState fields with both sequence_id and originator_id
Nice-to-have: Consider adding a test for entity IDs that don't exist in the database to explicitly document the behavior (though line 1312 tests this implicitly)

Overall Assessment: The core implementation is well-designed with good chunking logic and comprehensive tests. The compilation issues with tests need to be resolved before merge.

neekolas · 2025-10-01T01:01:49Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

Add get_last_cursor_for_ids DB query #2535 : 2 dependent PRs (#2443 , #2536 ) 👈 (View in Graphite)
Add sync_conversations benches #2534
Add find_message benches #2533
main

How to use the Graphite Merge Queue

Add the label mergequeue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

macroscopeapp · 2025-10-01T01:02:50Z

Add bulk `QueryRefreshState::get_last_cursor_for_ids` to return entity cursor mapping with 900-ID chunking in refresh_state.rs

Introduce a bulk cursor lookup that queries RefreshState by entity_kind and returns a HashMap<Vec<u8>, i64> keyed by entity_id bytes, performing chunked DB reads at 900 IDs per query to stay under SQLite bind limits.

Implement DbConnection<QueryRefreshState>::get_last_cursor_for_ids with early empty handling and 900-ID chunking in refresh_state.rs
Add trait method QueryRefreshState::get_last_cursor_for_ids and forwarding impls in refresh_state.rs
Add mock QueryRefreshState::get_last_cursor_for_ids in mock.rs
Add tests covering empty input, single/multiple IDs, exact 900, >900, >1800 chunking, and entity kind filtering in refresh_state.rs

📍Where to Start

Start with the DbConnection<QueryRefreshState>::get_last_cursor_for_ids implementation in refresh_state.rs.

Macroscope summarized 000ffda.

codecov · 2025-10-01T01:07:10Z

Codecov Report

❌ Patch coverage is 97.23320% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.12%. Comparing base (75492d6) to head (bd4342c).

Files with missing lines	Patch %	Lines
xmtp_db/src/encrypted_store/refresh_state.rs	97.23%	7 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2535      +/-   ##
==========================================
+ Coverage   76.98%   77.12%   +0.14%     
==========================================
  Files         283      283              
  Lines       37386    37639     +253     
==========================================
+ Hits        28781    29029     +248     
- Misses       8605     8610       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

neekolas mentioned this pull request Oct 1, 2025

Add find_message benches #2533

Closed

This was referenced Oct 1, 2025

Add sync_conversations benches #2534

Closed

Sync All optimizations #2536

Draft

neekolas force-pushed the 09-17-add_sync_conversations_benches branch from 7b49736 to e485c09 Compare October 8, 2025 17:44

neekolas force-pushed the 09-10-add-get-latest-cursor-for-ids branch from 28482a2 to 7a8ab42 Compare October 8, 2025 17:44

neekolas mentioned this pull request Oct 8, 2025

[WIP] Message subscription overhaul #2443

Closed

neekolas force-pushed the 09-10-add-get-latest-cursor-for-ids branch from 7a8ab42 to b5fdb49 Compare October 13, 2025 23:54

neekolas force-pushed the 09-17-add_sync_conversations_benches branch from e485c09 to 3fedaac Compare October 13, 2025 23:54

neekolas changed the title ~~Message subscription overhaul~~ Add get_last_cursor_for_ids DB query Oct 13, 2025

neekolas force-pushed the 09-10-add-get-latest-cursor-for-ids branch from b5fdb49 to ea52b7e Compare October 14, 2025 00:03

neekolas marked this pull request as ready for review October 14, 2025 00:05

neekolas requested a review from a team as a code owner October 14, 2025 00:05

graphite-app bot changed the base branch from 09-17-add_sync_conversations_benches to graphite-base/2535 October 15, 2025 18:41

neekolas force-pushed the graphite-base/2535 branch from 3fedaac to 75492d6 Compare October 16, 2025 20:23

neekolas force-pushed the 09-10-add-get-latest-cursor-for-ids branch from ea52b7e to bd4342c Compare October 16, 2025 20:23

neekolas changed the base branch from graphite-base/2535 to main October 16, 2025 20:24

Message subscription overhaul

000ffda

neekolas force-pushed the 09-10-add-get-latest-cursor-for-ids branch from bd4342c to 000ffda Compare October 21, 2025 06:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add get_last_cursor_for_ids DB query #2535

Add get_last_cursor_for_ids DB query #2535

Uh oh!

neekolas commented Oct 1, 2025 •

edited

Loading

Uh oh!

claude bot commented Oct 1, 2025 •

edited

Loading

Uh oh!

neekolas commented Oct 1, 2025 •

edited

Loading

Uh oh!

macroscopeapp bot commented Oct 1, 2025 •

edited

Loading

Uh oh!

codecov bot commented Oct 1, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add get_last_cursor_for_ids DB query #2535

Are you sure you want to change the base?

Add get_last_cursor_for_ids DB query #2535

Uh oh!

Conversation

neekolas commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR

What changed?

How to test?

Why make this change?

Uh oh!

claude bot commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Complete ✓

Overview

Code Quality & Best Practices ✅

Potential Issues 🔍

1. Schema Field Reference Issue (xmtp_db/src/encrypted_store/refresh_state.rs:358)

2. Test Code Uses Non-Existent Fields (Lines 1237-1468)

Performance Considerations ⚡

Security Concerns 🔒

Test Coverage 📊

Recommendations

Uh oh!

neekolas commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to use the Graphite Merge Queue

Uh oh!

macroscopeapp bot commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add bulk QueryRefreshState::get_last_cursor_for_ids to return entity cursor mapping with 900-ID chunking in refresh_state.rs

📍Where to Start

Uh oh!

codecov bot commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

neekolas commented Oct 1, 2025 •

edited

Loading

claude bot commented Oct 1, 2025 •

edited

Loading

neekolas commented Oct 1, 2025 •

edited

Loading

macroscopeapp bot commented Oct 1, 2025 •

edited

Loading

Add bulk `QueryRefreshState::get_last_cursor_for_ids` to return entity cursor mapping with 900-ID chunking in refresh_state.rs

codecov bot commented Oct 1, 2025 •

edited

Loading