Skip to content

perf: cut train-loop sync/retrace overheads in OneVision Encoder#93

Open
Luodian wants to merge 5 commits intomainfrom
dev/perf-train-hotpath
Open

perf: cut train-loop sync/retrace overheads in OneVision Encoder#93
Luodian wants to merge 5 commits intomainfrom
dev/perf-train-hotpath

Conversation

@Luodian
Copy link
Collaborator

@Luodian Luodian commented Feb 10, 2026

summary

  • fix grad-accum step boundary + scheduler stepping alignment
  • cut per-step sync points in training/train.py (remove hot-path .item() / host sync patterns)
  • add safer compile control: --compile_backend {auto,none,inductor,aot_eager,eager}
  • in auto, disable compile on mixed dali_type to avoid retrace storms
  • cache RoPE frequency grids and skip dense-path identity gather in encoder forward
  • reduce dataloader overhead in data_v2* wrappers (drop redundant .cuda(), raise prefetch/thread defaults)
  • reduce per-step frame-sampling allocation pressure in residual branch

why

  • less GPU<->CPU sync in hot path
  • fewer dynamic graph recompiles under mixed input signatures
  • less repeated tensor construction / memory traffic per step
  • better input pipeline overlap for image branches

replay & exact-match notes

  • This PR is performance-oriented and does not guarantee byte-exact identity against historical runs under default settings.
  • For closest parity with previous behavior, run with:
    • --compile_backend none (disable torch.compile wrapper)
  • If you need strict checkpoint-comparison parity too, keep all non-performance knobs unchanged (dataset list/order, random seeds, worker layout, and DALI environment/config).
  • Note: if your environment previously used old data_v2.py / data_v2_ocr.py defaults for prefetch_queue_depth=1, this PR changes them to 3; this can change data ordering timing while keeping sample values unchanged.

validation

  • python3 -m py_compile training/train.py onevision_encoder/modeling_onevision_encoder.py dataloader/data_v2.py dataloader/data_v2_ocr.py dataloader/data_v2_multi_res.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant