flowers-team/StickToYourRoleLeaderboard · Plans to include additional models?

Apr 21

Hello, I'm just inquiring as to whether there's any plans to further update the this benchmark/leaderboard with additional models. Would there be any way for us to request models to be tested/benchmarked?

grg

Flowers Team Inria org Apr 22

Hello! I'm doing my best to maintain the leaderboard with the time I have between other projects. 🙂
Absolutely — feel free to suggest models! Ideally, they should be runnable with vLLM and have a context length of at least ~8k tokens. You’re welcome to post suggestions here or open a new issue.

SamuraiBarbi

May 1

•

edited May 1

Would we be able to test the following models?

https://huggingface.co/shuttleai/shuttle-3.5

https://huggingface.co/THUDM/GLM-4-32B-0414

https://huggingface.co/Qwen/Qwen3-235B-A22B

https://huggingface.co/Qwen/Qwen3-30B-A3B

https://huggingface.co/Qwen/Qwen3-32B

https://huggingface.co/Qwen/Qwen3-8B

https://huggingface.co/Qwen/Qwen3-4B

These are more recent models that have dropped where I've seen creative writing benchmarking/evaluation but none really on role play.

Edit: Added Qwen3-235B-A22B to the list