Text Generation
Transformers
Safetensors
English
olmo2
conversational
Inference Endpoints

32b, 4k ctx?

#2
by lucyknada - opened

is 4k the final context-length planned for this model? or is there more in the works?

I really like what they did with the the whole "fully open source" deal, but the 4k context length is indeed head-scratching. I'd also like to hear a word on this.

image.png

self replying, since apparently it was mentioned on twitter but not here, what "very soon" means is another question.

I can’t serve this model with a context length limited to 4K. A 4K context might be acceptable for smaller models (0.5B or 1B) intended for on-device use cases, but for a 32B model, I need it to support at least a 128K context window to achieve decent performance at 32K.

Anyone else getting

ollama run MHKetbi/allenai_OLMo2-0325-32B-Instruct:Q8_0 Error: llama runner process has terminated: error loading model: check_tensor_dims: tensor 'blk.0.attn_k_norm.weight' has wrong shape; expected 5120, got 1024, 1, 1, 1

Hi @lucyknada , we’re working on making the context longer. We’re definitely planning to do that in the next versions. Stay tuned! Thanks everyone else for the feedback.

Anyone else getting

ollama run MHKetbi/allenai_OLMo2-0325-32B-Instruct:Q8_0 Error: llama runner process has terminated: error loading model: check_tensor_dims: tensor 'blk.0.attn_k_norm.weight' has wrong shape; expected 5120, got 1024, 1, 1, 1

Hey @TheSeminal , there is this issue on going with llama.cpp. Check this out for more context: https://huggingface.co/allenai/OLMo-2-0325-32B-Instruct-GGUF/discussions/1

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment