Qwen
/

Text Generation
Transformers
Safetensors
qwen3_moe
conversational
fp8

yarn scale to 122k context length

#5
by nbroad - opened

Please don't merge or close this. I'm just going to use this pr revision to run the model at 122k sequence length

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment