vllm (pretrained=/root/autodl-tmp/Seed-Coder-8B-Instruct,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.576 ± 0.0313
strict-match 5 exact_match 0.576 ± 0.0313

vllm (pretrained=/root/autodl-tmp/Seed-Coder-8B-Instruct,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.602 ± 0.0219
strict-match 5 exact_match 0.598 ± 0.0219

vllm (pretrained=/root/autodl-tmp/Seed-Coder-8B-Instruct,add_bos_token=true,max_model_len=3048,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc 0.4386 ± 0.0167
- humanities 2 none acc 0.4000 ± 0.0343
- other 2 none acc 0.4872 ± 0.0356
- social sciences 2 none acc 0.4389 ± 0.0364
- stem 2 none acc 0.4316 ± 0.0288

vllm (pretrained=/root/autodl-tmp/80-128,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.56 ± 0.0315
strict-match 5 exact_match 0.56 ± 0.0315

vllm (pretrained=/root/autodl-tmp/80-128,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.590 ± 0.0220
strict-match 5 exact_match 0.584 ± 0.0221

vllm (pretrained=/root/autodl-tmp/80-128,add_bos_token=true,max_model_len=3048,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc 0.4339 ± 0.0166
- humanities 2 none acc 0.3949 ± 0.0338
- other 2 none acc 0.4769 ± 0.0355
- social sciences 2 none acc 0.4333 ± 0.0361
- stem 2 none acc 0.4316 ± 0.0290

vllm (pretrained=/root/autodl-tmp/80-256,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.584 ± 0.0312
strict-match 5 exact_match 0.584 ± 0.0312

vllm (pretrained=/root/autodl-tmp/80-256,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.590 ± 0.022
strict-match 5 exact_match 0.586 ± 0.022

vllm (pretrained=/root/autodl-tmp/80-256,add_bos_token=true,max_model_len=3048,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc 0.4246 ± 0.0165
- humanities 2 none acc 0.3795 ± 0.0336
- other 2 none acc 0.4872 ± 0.0356
- social sciences 2 none acc 0.4333 ± 0.0360
- stem 2 none acc 0.4070 ± 0.0282

vllm (pretrained=/root/autodl-tmp/80-512,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.604 ± 0.031
strict-match 5 exact_match 0.600 ± 0.031

vllm (pretrained=/root/autodl-tmp/80-512,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.594 ± 0.022
strict-match 5 exact_match 0.586 ± 0.022

vllm (pretrained=/root/autodl-tmp/80-512,add_bos_token=true,max_model_len=3048,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc 0.4316 ± 0.0166
- humanities 2 none acc 0.4000 ± 0.0341
- other 2 none acc 0.4821 ± 0.0355
- social sciences 2 none acc 0.4278 ± 0.0356
- stem 2 none acc 0.4211 ± 0.0289
Downloads last month
2
Safetensors
Model size
8.25B params
Tensor type
BF16
·
I8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for noneUsername/Seed-Coder-8B-Instruct-W8A8