32b, 4k ctx?
is 4k the final context-length planned for this model? or is there more in the works?
I really like what they did with the the whole "fully open source" deal, but the 4k context length is indeed head-scratching. I'd also like to hear a word on this.
I can’t serve this model with a context length limited to 4K. A 4K context might be acceptable for smaller models (0.5B or 1B) intended for on-device use cases, but for a 32B model, I need it to support at least a 128K context window to achieve decent performance at 32K.
Anyone else getting
ollama run MHKetbi/allenai_OLMo2-0325-32B-Instruct:Q8_0 Error: llama runner process has terminated: error loading model: check_tensor_dims: tensor 'blk.0.attn_k_norm.weight' has wrong shape; expected 5120, got 1024, 1, 1, 1
Hi @lucyknada , we’re working on making the context longer. We’re definitely planning to do that in the next versions. Stay tuned! Thanks everyone else for the feedback.
Anyone else getting
ollama run MHKetbi/allenai_OLMo2-0325-32B-Instruct:Q8_0 Error: llama runner process has terminated: error loading model: check_tensor_dims: tensor 'blk.0.attn_k_norm.weight' has wrong shape; expected 5120, got 1024, 1, 1, 1
Hey @TheSeminal , there is this issue on going with llama.cpp. Check this out for more context: https://huggingface.co/allenai/OLMo-2-0325-32B-Instruct-GGUF/discussions/1
Hi @lucyknada , we’re working on making the context longer. We’re definitely planning to do that in the next versions. Stay tuned! Thanks everyone else for the feedback.
That's fantastic to hear. Any chance there is a rough timeline for when that might happen? Thank you again for all you do!