noneUsername/DistilQwen2.5-DS3-0324-32B-awq

vllm (pretrained=/root/autodl-tmp/DistilQwen2.5-DS3-0324-32B,add_bos_token=true,max_model_len=3096,dtype=bfloat16,tensor_parallel_size=4), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.564	±	0.0314
		strict-match	5	exact_match	↑	0.376	±	0.0307

vllm (pretrained=/root/autodl-tmp/DistilQwen2.5-DS3-0324-32B,add_bos_token=true,max_model_len=5096,dtype=bfloat16,tensor_parallel_size=4), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.570	±	0.0222
		strict-match	5	exact_match	↑	0.412	±	0.0220

vllm (pretrained=/root/autodl-tmp/DistilQwen2.5-DS3-0324-32B,add_bos_token=true,max_model_len=3048,dtype=bfloat16,tensor_parallel_size=4), gen_kwargs: (None), limit: 5.0, num_fewshot: None, batch_size: 1

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.8421	±	0.0200
- humanities	2	none	acc	↑	0.8308	±	0.0421
- other	2	none	acc	↑	0.8308	±	0.0435
- social sciences	2	none	acc	↑	0.8833	±	0.0333
- stem	2	none	acc	↑	0.8316	±	0.0380

vllm (pretrained=/root/autodl-tmp/DistilQwen2.5-DS3-0324-32B-awq,add_bos_token=true,max_model_len=3096,dtype=bfloat16,tensor_parallel_size=4), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.508	±	0.0317
		strict-match	5	exact_match	↑	0.380	±	0.0308

vllm (pretrained=/root/autodl-tmp/DistilQwen2.5-DS3-0324-32B-awq,add_bos_token=true,max_model_len=5096,dtype=bfloat16,tensor_parallel_size=4), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.542	±	0.0223
		strict-match	5	exact_match	↑	0.394	±	0.0219

vllm (pretrained=/root/autodl-tmp/DistilQwen2.5-DS3-0324-32B-awq,add_bos_token=true,max_model_len=3048,dtype=bfloat16,tensor_parallel_size=4), gen_kwargs: (None), limit: 5.0, num_fewshot: None, batch_size: 1

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.8526	±	0.0192
- humanities	2	none	acc	↑	0.8308	±	0.0421
- other	2	none	acc	↑	0.8308	±	0.0449
- social sciences	2	none	acc	↑	0.8833	±	0.0333
- stem	2	none	acc	↑	0.8632	±	0.0333

noneUsername
/

DistilQwen2.5-DS3-0324-32B-awq

Model tree for noneUsername/DistilQwen2.5-DS3-0324-32B-awq