--- base_model: - alibaba-pai/DistilQwen2.5-DS3-0324-32B --- vllm (pretrained=/root/autodl-tmp/DistilQwen2.5-DS3-0324-32B,add_bos_token=true,max_model_len=3096,dtype=bfloat16,tensor_parallel_size=4), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.564|± |0.0314| | | |strict-match | 5|exact_match|↑ |0.376|± |0.0307| vllm (pretrained=/root/autodl-tmp/DistilQwen2.5-DS3-0324-32B,add_bos_token=true,max_model_len=5096,dtype=bfloat16,tensor_parallel_size=4), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.570|± |0.0222| | | |strict-match | 5|exact_match|↑ |0.412|± |0.0220| vllm (pretrained=/root/autodl-tmp/DistilQwen2.5-DS3-0324-32B,add_bos_token=true,max_model_len=3048,dtype=bfloat16,tensor_parallel_size=4), gen_kwargs: (None), limit: 5.0, num_fewshot: None, batch_size: 1 | Groups |Version|Filter|n-shot|Metric| |Value | |Stderr| |------------------|------:|------|------|------|---|-----:|---|-----:| |mmlu | 2|none | |acc |↑ |0.8421|± |0.0200| | - humanities | 2|none | |acc |↑ |0.8308|± |0.0421| | - other | 2|none | |acc |↑ |0.8308|± |0.0435| | - social sciences| 2|none | |acc |↑ |0.8833|± |0.0333| | - stem | 2|none | |acc |↑ |0.8316|± |0.0380| vllm (pretrained=/root/autodl-tmp/DistilQwen2.5-DS3-0324-32B-awq,add_bos_token=true,max_model_len=3096,dtype=bfloat16,tensor_parallel_size=4), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.508|± |0.0317| | | |strict-match | 5|exact_match|↑ |0.380|± |0.0308| vllm (pretrained=/root/autodl-tmp/DistilQwen2.5-DS3-0324-32B-awq,add_bos_token=true,max_model_len=5096,dtype=bfloat16,tensor_parallel_size=4), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.542|± |0.0223| | | |strict-match | 5|exact_match|↑ |0.394|± |0.0219| vllm (pretrained=/root/autodl-tmp/DistilQwen2.5-DS3-0324-32B-awq,add_bos_token=true,max_model_len=3048,dtype=bfloat16,tensor_parallel_size=4), gen_kwargs: (None), limit: 5.0, num_fewshot: None, batch_size: 1 | Groups |Version|Filter|n-shot|Metric| |Value | |Stderr| |------------------|------:|------|------|------|---|-----:|---|-----:| |mmlu | 2|none | |acc |↑ |0.8526|± |0.0192| | - humanities | 2|none | |acc |↑ |0.8308|± |0.0421| | - other | 2|none | |acc |↑ |0.8308|± |0.0449| | - social sciences| 2|none | |acc |↑ |0.8833|± |0.0333| | - stem | 2|none | |acc |↑ |0.8632|± |0.0333|