Skip to content

Training Time Details for 3B and 7B Models #19

@naajeehxe

Description

@naajeehxe

Hi, thank you for sharing this great work.

I noticed in the paper that training was conducted using eight H100 GPUs. Would you mind sharing approximately how long training took for the 3B and 7B models, separately for the SFT and RL stages?

This information would be very helpful for understanding the practical training cost and for reproduction efforts. Thank you in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions