32b, 4k ctx?
is 4k the final context-length planned for this model? or is there more in the works?
I really like what they did with the the whole "fully open source" deal, but the 4k context length is indeed head-scratching. I'd also like to hear a word on this.
I can’t serve this model with a context length limited to 4K. A 4K context might be acceptable for smaller models (0.5B or 1B) intended for on-device use cases, but for a 32B model, I need it to support at least a 128K context window to achieve decent performance at 32K.
Anyone else getting
ollama run MHKetbi/allenai_OLMo2-0325-32B-Instruct:Q8_0 Error: llama runner process has terminated: error loading model: check_tensor_dims: tensor 'blk.0.attn_k_norm.weight' has wrong shape; expected 5120, got 1024, 1, 1, 1
Hi @lucyknada , we’re working on making the context longer. We’re definitely planning to do that in the next versions. Stay tuned! Thanks everyone else for the feedback.
Anyone else getting
ollama run MHKetbi/allenai_OLMo2-0325-32B-Instruct:Q8_0 Error: llama runner process has terminated: error loading model: check_tensor_dims: tensor 'blk.0.attn_k_norm.weight' has wrong shape; expected 5120, got 1024, 1, 1, 1
Hey @TheSeminal , there is this issue on going with llama.cpp. Check this out for more context: https://huggingface.co/allenai/OLMo-2-0325-32B-Instruct-GGUF/discussions/1