Clarification on `rope_theta` in the checkpoints

#6
by mjkmain - opened

Hi OLMo team and community,

First, thank you for openly sharing the OLMo-2-0325-32B model and the accompanying technical report, your transparency is invaluable to the research community.

In the report, you mention increasing RoPE theta from 10,000 to 500,000. When I inspect the model_config.json of the stage1-step1000-tokens9B checkpoint, I see:

"rope_theta": 500000

Could you clarify whether this checkpoint was indeed trained with rope_theta=500,000? If so, is it safe to run inference directly with this configuration, or are there any additional considerations we should keep in mind?

Thanks again for your outstanding open-source contribution, looking forward to your guidance!

Best regards,
Minjun Kim

Hello, thanks for the kind words and the question!

Can I ask where you're seeing the mention of increasing RoPE theta from 10,000 to 500,000? I'm wondering if you're reading it from the RoPE theta section of this paper, https://arxiv.org/pdf/2501.00656, which says (page 6):

RoPE theta = 5e5: We increase the RoPE to 500,000 from 10,000. This approach increases the resolution of
positional encoding, matching Grattafiori et al. (2024).

This section is explaining how the OLMo2 models are different from previous iteration, where rope_theta was 10,000. For OLMo2, rope_theta should be 500,000. And yup, should be safe to run inference with this config!

Hi OLMo team,

Thanks so much for the quick clarification!

I realize I’d misunderstood the note about RoPE theta. I initially thought theta started at 10,000 during early training and was later increased to 500,000 within OLMo-2. Your explanation makes it clear that 500,000 is the setting for OLMo-2, and the 10,000 figure refers to earlier iterations, got it.

Appreciate the help and the great work you’re doing.

Thank you!

Sign up or log in to comment