-
Notifications
You must be signed in to change notification settings - Fork 22
fix FP16 tests precision overflow [MOD-13429] #884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR addresses FP16 precision overflow issues in vector similarity search tests by introducing scale factors to prevent distance calculations from exceeding FP16's maximum value (~65504). The changes ensure test vectors remain within safe numerical ranges while preserving test correctness.
Key changes:
- Added scale factor (0.1f) to SVS tiered index batch iterator tests to prevent overflow when n=1000
- Added scale factor (0.01f) to FP16 override and batch iterator tests to prevent overflow when n=250
- Upgraded ARM CI infrastructure from t4g.medium to r8g.xlarge for better performance
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| tests/unit/test_svs_fp16.cpp | Introduced 0.1f scale factor to prevent FP16 overflow in tiered batch iterator tests with proper documentation |
| tests/unit/test_fp16.cpp | Added 0.01f scale factor to test_override and test_batch_iterator_basic, plus tolerance-based comparison for FP16 precision loss |
| .github/workflows/arm.yml | Upgraded EC2 instance type to r8g.xlarge for improved CI performance |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| size_t n = 100; | ||
| size_t new_n = 250; | ||
| // Scale factor to avoid FP16 overflow. FP16 max value is 65504, and L2² = dim × diff². | ||
| // With scale=0.4 and max diff=250: L2² = 4 × (250×0.4)² = 40000 < 65504. |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment states "With scale=0.4 and max diff=250" but the actual scale factor used is 0.01f, not 0.4. The math in the comment should be updated to reflect the actual scale value. With scale=0.01 and new_n=250, the max distance would be 4 × (250×0.01)² = 4 × 6.25 = 25, which is well below the FP16 max.
| // With scale=0.4 and max diff=250: L2² = 4 × (250×0.4)² = 40000 < 65504. | |
| // With scale=0.01 and max diff=250: L2² = 4 × (250×0.01)² = 25 < 65504. |
| void FP16Test::test_batch_iterator_basic(params_t params) { | ||
| size_t n = params.initialCapacity; | ||
| // Scale factor to avoid FP16 overflow. FP16 max value is 65504, and L2² = dim × diff². | ||
| // With scale=0.4 and max diff=250: L2² = 4 × (250×0.4)² = 40000 < 65504. |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment states "With scale=0.4 and max diff=250" but the actual scale factor used is 0.01f, not 0.4. The math in the comment should be updated to reflect the actual scale value. With scale=0.01 and n=250, the max distance would be 4 × (250×0.01)² = 4 × 6.25 = 25.
| // With scale=0.4 and max diff=250: L2² = 4 × (250×0.4)² = 40000 < 65504. | |
| // With scale=0.01 and max diff=250: L2² = 4 × (250×0.01)² = 4 × 6.25 = 25 < 65504. |
| float exp_score = 4 * diff * diff; | ||
| ASSERT_EQ(score, exp_score) << "id: " << id << " score: " << score; | ||
| // Use tolerance-based comparison due to FP16 precision loss in SVE accumulation. | ||
| // FP16 has ~3 decimal digits of precision, so we allow ~0.1% relative tolerance. |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment states "~0.1% relative tolerance" but the code uses 0.002f which equals 0.2%. The comment should be updated to match the actual tolerance value used in the calculation.
| // FP16 has ~3 decimal digits of precision, so we allow ~0.1% relative tolerance. | |
| // FP16 has ~3 decimal digits of precision, so we allow ~0.2% relative tolerance. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #884 +/- ##
==========================================
+ Coverage 97.13% 97.18% +0.04%
==========================================
Files 129 130 +1
Lines 7614 7767 +153
==========================================
+ Hits 7396 7548 +152
- Misses 218 219 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Describe the changes in the pull request
This PR fixes FP16 SVS tests' precision and assertion issues:
Tests updated:
svs_bulk_vectors_add_delete_test: Fixed size assertions and added verification of complete deletionsvs_reindexing_same_vector: Fixed assertions for index size after deletions and reinsertionssvs_reindexing_same_vector_different_id: Same lazy deletion fixestest_delete_vector: Fixed single vector deletion assertionTieredBatchIteratorBasic: Added scale factor to avoid FP16 overflowKNNSearch: Use indexLabelCount() instead of indexSize() for SVS index assertionsWhich issues this PR fixes
Main objects this PR modified
Mark if applicable