Reduce sqrt128 multiply cost: mul128By64, precompute target, 128-bit sig_z²#1337
Reduce sqrt128 multiply cost: mul128By64, precompute target, 128-bit sig_z²#1337justinzhuguangwen wants to merge 1 commit intoboostorg:developfrom
Conversation
|
Hi @ckormanyos @mborland, Follow-up micro-optimization on sqrt — reviewed each step for redundant work:
decimal128 throughput +16%, all tests pass. |
|
An automated preview of the documentation is available at https://1337.decimal.prtest3.cppalliance.org/libs/decimal/doc/html/index.html If more commits are pushed to the pull request, the docs will rebuild at the same URL. 2026-02-09 02:02:55 UTC |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #1337 +/- ##
=========================================
+ Coverage 98.9% 98.9% +0.1%
=========================================
Files 283 283
Lines 18327 18337 +10
Branches 1939 1939
=========================================
+ Hits 18109 18120 +11
+ Misses 218 217 -1
... and 1 file with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
Summary
Reduce redundant and oversized multiplications in sqrt implementations.
Key changes:
sig_gx × scalecomputed once, was recomputed 2–4× per callsig_zsig_z < √10 × 10³³fits in 128 bits →umul256(u128, u128)(4 muls) replacesu256 × u256(16 muls via Knuth), called 4× per sqrt128Performance delta (this PR only, sqrt_bench.py, WSL2 x86_64, g++ -O3)
decimal32/64 within noise — optimization targets decimal128 multiply path.
Overall speedup vs original baseline
Test
All existing sqrt tests pass (
test_sqrt,test_cmath,test_constants,github_issue_1107,github_issue_1110).