【Evaluation】Best practice for evaluating Qwen3 !!

#2
by wangxingjun778 - opened

For more details, please refer to: https://evalscope.readthedocs.io/en/latest/best_practice/qwen3.html
Power by: EvalScope https://github.com/modelscope/evalscope

  1. Speed Benchmark

image.png

image.png

  1. Benchmark collection (for evaluating abilities such as code、understanding、instruction following、math ...)

    NOTE: The result is based on samples of original benchmarks with eval arg --limit

image.png

  1. Thinking efficiency of Qwen3

image.png

image.png

  1. Run Gradio visualization
evalscope app

image.png

Get started and have fun ! :)

Do you have resources to test this finetuned model? Ty
https://huggingface.co/wzx111/Qwen3-1.7B-MATH-GDPO

Sign up or log in to comment