Plans to include additional models?
Hello, I'm just inquiring as to whether there's any plans to further update the this benchmark/leaderboard with additional models. Would there be any way for us to request models to be tested/benchmarked?
Hello! I'm doing my best to maintain the leaderboard with the time I have between other projects. 🙂
Absolutely — feel free to suggest models! Ideally, they should be runnable with vLLM and have a context length of at least ~8k tokens. You’re welcome to post suggestions here or open a new issue.
Would we be able to test the following models?
https://huggingface.co/shuttleai/shuttle-3.5
https://huggingface.co/THUDM/GLM-4-32B-0414
https://huggingface.co/Qwen/Qwen3-235B-A22B
https://huggingface.co/Qwen/Qwen3-30B-A3B
https://huggingface.co/Qwen/Qwen3-32B
https://huggingface.co/Qwen/Qwen3-8B
https://huggingface.co/Qwen/Qwen3-4B
These are more recent models that have dropped where I've seen creative writing benchmarking/evaluation but none really on role play.
Edit: Added Qwen3-235B-A22B to the list