Skip to content

New inverse stream map to accelerate convergence#1919

Open
unalmis wants to merge 33 commits intomasterfrom
ku/bounce
Open

New inverse stream map to accelerate convergence#1919
unalmis wants to merge 33 commits intomasterfrom
ku/bounce

Conversation

@unalmis
Copy link
Collaborator

@unalmis unalmis commented Sep 18, 2025

PR Age

Inverse stream maps

Improvements

  • Check-pointing to increase speed and reduce memory consumption of reverse mode differentiation Checkpointing to reduce reverse mode AD memory usage #1347.
  • Adds low_ram mode which is same speed and less memory for objective.compute, but slower for objective.grad since JAX is poor at iterative algorithms.
  • Fully resolves Memory regression in bounce integrals #1864 by avoiding materialization of a large tensor in memory. Previously, we had closed the issue by adding nuffts as a workaround. The improvement here actually solves the JAX regression.
  • Reuses some computations in identifying bounce points to make more efficient.
  • Increase cache hits, fusing, and reduce floating point error in computing bounce points (very important for accurate integrals).
  • Transforms an improper field line integral to one on a compact domain where the integrand is periodic to achieve faster convergence.
  • Improves performance complexity of interp_to_argmin for Bounce2D from fourth order to spectral as required for Alpert quadrature.
  • Resolves Use OOP for surface integrals with faster methods for tensor product grids #1389.

Usability

Bugs

Benchmarks

Just go to #2026 and run effective_ripple_profile.py. You will see the large performance improvement from master. The CI benchmarks do not reveal this because those benchmarks are essentially just noise. Note that, using the same parameter inputs, the resolution of this branch is also higher than master due to the faster convergence.

  • If you set use_bounce1d=True on that script, you will run out of memory as expected since it is an inferior approach (as expected, you get the OOM in the jacobian before you compute a single bounce integral).
  • If you set nufft_eps=0, you need 175 GB to run that script on master (you'll get an OOM and JAX will tell you it needs 175GB), but only 35 GB on this branch.
  • Using nuffts, the script requires only 6.7 GB on this branch.

Examples

HELIOTRON

Master branch

test_theta_chebyshev_HELIOTRON

This branch

test_delta_fourier_chebyshev_HELIOTRON

W7-X

Master branch

test_theta_chebyshev_W7-X

This branch

test_delta_fourier_chebyshev_W7-X

NCSX

Master branch

test_theta_chebyshev_NCSX

This branch

test_delta_fourier_chebyshev_NCSX

Removal of spectral aliasing

Figure_1

@unalmis unalmis self-assigned this Sep 18, 2025
@unalmis unalmis added performance New feature or request to make the code faster robustness Make the code more robust labels Sep 18, 2025
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@unalmis unalmis changed the base branch from master to ku/nufft September 18, 2025 07:45
@unalmis unalmis marked this pull request as draft September 18, 2025 07:47
@unalmis unalmis added the theory Requires theory work before coding label Sep 18, 2025
@unalmis unalmis changed the title New inverse stream maps to accelerate convergence New inverse stream map to accelerate convergence Sep 18, 2025
@unalmis unalmis added the bug fix Something was fixed label Sep 18, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Sep 18, 2025

Memory benchmark result

|               Test Name                |      %Δ      |    Master (MB)     |      PR (MB)       |    Δ (MB)    |    Time PR (s)     |  Time Master (s)   |
| -------------------------------------- | ------------ | ------------------ | ------------------ | ------------ | ------------------ | ------------------ |
  test_objective_jac_w7x                 |   -2.85 %    |     3.960e+03      |     3.847e+03      |   -113.00    |       39.03        |       36.42        |
  test_proximal_jac_w7x_with_eq_update   |   -0.16 %    |     6.489e+03      |     6.479e+03      |    -10.11    |       156.32       |       158.97       |
  test_proximal_freeb_jac                |    0.21 %    |     1.318e+04      |     1.321e+04      |    27.09     |       82.97        |       83.59        |
  test_proximal_freeb_jac_blocked        |    0.91 %    |     7.475e+03      |     7.543e+03      |    67.92     |       71.84        |       73.66        |
  test_proximal_freeb_jac_batched        |    0.32 %    |     7.491e+03      |     7.516e+03      |    24.09     |       70.23        |       72.14        |
  test_proximal_jac_ripple               |   -5.86 %    |     3.607e+03      |     3.395e+03      |   -211.50    |       60.24        |       65.15        |
  test_proximal_jac_ripple_bounce1d      |   -3.28 %    |     3.605e+03      |     3.487e+03      |   -118.13    |       73.08        |       76.09        |
  test_eq_solve                          |    1.05 %    |     2.006e+03      |     2.027e+03      |    21.12     |       92.28        |       94.30        |

For the memory plots, go to the summary of Memory Benchmarks workflow and download the artifact.

@unalmis unalmis removed bug fix Something was fixed theory Requires theory work before coding labels Sep 18, 2025
@unalmis unalmis changed the title New inverse stream map to accelerate convergence better inverse stream map to accelerate convergence Sep 20, 2025
@unalmis unalmis changed the title better inverse stream map to accelerate convergence new inverse stream map to accelerate convergence Sep 20, 2025
@unalmis unalmis added the theory Requires theory work before coding label Sep 20, 2025
@unalmis unalmis force-pushed the ku/bounce branch 2 times, most recently from a6d949b to d685405 Compare September 22, 2025 04:33
@unalmis unalmis removed the theory Requires theory work before coding label Sep 22, 2025
@unalmis unalmis added the P3 Highest Priority, someone is/should be actively working on this label Sep 23, 2025
@unalmis unalmis dismissed f0uriest’s stale review September 23, 2025 08:44

addressed request

@unalmis unalmis marked this pull request as ready for review September 23, 2025 08:44
@unalmis unalmis requested review from a team, f0uriest and rahulgaur104 and removed request for a team September 23, 2025 08:44
Copy link
Collaborator

@YigitElma YigitElma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume there is no math error, I couldn't check it very well because it is really hard to follow the code with all the different angle calls and some of function names are ambigious.

Are you planning to maintain the interpax_fft package?

@unalmis unalmis mentioned this pull request Jan 11, 2026
@unalmis
Copy link
Collaborator Author

unalmis commented Jan 21, 2026

I assume there is no math error...

Yes, and I have resolved your other comments concerning cosmetics in the code comments.

When drafting a reply to a reviewer comment, I realized that the atomic
derivative computed by the autodiff tool for the `spline=True` option is
not correct if the bounce point lies near a local maxima. The
`spline=False` option is fine. It is unlikely this would have affected
optimization. 

See section 3 of 

[autodiff.pdf](https://github.com/user-attachments/files/24988182/autodiff.pdf)
@unalmis unalmis mentioned this pull request Feb 26, 2026
3 tasks
@unalmis unalmis mentioned this pull request Feb 26, 2026
6 tasks
@unalmis
Copy link
Collaborator Author

unalmis commented Feb 27, 2026

Someone else who is still working on desc will need to maintain this PR from now on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug fix Something was fixed easy Short and simple to code or review P3 Highest Priority, someone is/should be actively working on this performance New feature or request to make the code faster robustness Make the code more robust run_benchmarks Run timing benchmarks on this PR against current master branch stable Awaiting merge to master. Only updates will be merging from master.

Projects

None yet

4 participants