Hi, thank you for sharing this great work.
I noticed in the paper that training was conducted using eight H100 GPUs. Would you mind sharing approximately how long training took for the 3B and 7B models, separately for the SFT and RL stages?
This information would be very helpful for understanding the practical training cost and for reproduction efforts. Thank you in advance!