--- base_model: - agentica-org/DeepCoder-14B-Preview --- vllm (pretrained=/root/autodl-tmp/DeepCoder-14B-Preview,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.732|± |0.0281| | | |strict-match | 5|exact_match|↑ |0.856|± |0.0222| vllm (pretrained=/root/autodl-tmp/DeepCoder-14B-Preview,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.766|± |0.0190| | | |strict-match | 5|exact_match|↑ |0.856|± |0.0157| vllm (pretrained=/root/autodl-tmp/DeepCoder-14B-Preview,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1 | Groups |Version|Filter|n-shot|Metric| |Value | |Stderr| |------------------|------:|------|------|------|---|-----:|---|-----:| |mmlu | 2|none | |acc |↑ |0.7345|± |0.0139| | - humanities | 2|none | |acc |↑ |0.7333|± |0.0283| | - other | 2|none | |acc |↑ |0.7385|± |0.0295| | - social sciences| 2|none | |acc |↑ |0.8000|± |0.0285| | - stem | 2|none | |acc |↑ |0.6912|± |0.0254| vllm (pretrained=/root/autodl-tmp/80-256,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.768|± |0.0268| | | |strict-match | 5|exact_match|↑ |0.868|± |0.0215| vllm (pretrained=/root/autodl-tmp/80-256,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.764|± |0.0190| | | |strict-match | 5|exact_match|↑ |0.884|± |0.0143| vllm (pretrained=/root/autodl-tmp/80-256,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1 | Groups |Version|Filter|n-shot|Metric| |Value | |Stderr| |------------------|------:|------|------|------|---|-----:|---|-----:| |mmlu | 2|none | |acc |↑ |0.7345|± |0.0139| | - humanities | 2|none | |acc |↑ |0.7179|± |0.0287| | - other | 2|none | |acc |↑ |0.7538|± |0.0287| | - social sciences| 2|none | |acc |↑ |0.8167|± |0.0275| | - stem | 2|none | |acc |↑ |0.6807|± |0.0257|