vllm (pretrained=/root/autodl-tmp/Seed-Coder-8B-Instruct,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.576 | ± | 0.0313 |
strict-match | 5 | exact_match | ↑ | 0.576 | ± | 0.0313 |
vllm (pretrained=/root/autodl-tmp/Seed-Coder-8B-Instruct,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.602 | ± | 0.0219 |
strict-match | 5 | exact_match | ↑ | 0.598 | ± | 0.0219 |
vllm (pretrained=/root/autodl-tmp/Seed-Coder-8B-Instruct,add_bos_token=true,max_model_len=3048,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto
Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
mmlu | 2 | none | acc | ↑ | 0.4386 | ± | 0.0167 | |
- humanities | 2 | none | acc | ↑ | 0.4000 | ± | 0.0343 | |
- other | 2 | none | acc | ↑ | 0.4872 | ± | 0.0356 | |
- social sciences | 2 | none | acc | ↑ | 0.4389 | ± | 0.0364 | |
- stem | 2 | none | acc | ↑ | 0.4316 | ± | 0.0288 |
vllm (pretrained=/root/autodl-tmp/80-128,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.56 | ± | 0.0315 |
strict-match | 5 | exact_match | ↑ | 0.56 | ± | 0.0315 |
vllm (pretrained=/root/autodl-tmp/80-128,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.590 | ± | 0.0220 |
strict-match | 5 | exact_match | ↑ | 0.584 | ± | 0.0221 |
vllm (pretrained=/root/autodl-tmp/80-128,add_bos_token=true,max_model_len=3048,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto
Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
mmlu | 2 | none | acc | ↑ | 0.4339 | ± | 0.0166 | |
- humanities | 2 | none | acc | ↑ | 0.3949 | ± | 0.0338 | |
- other | 2 | none | acc | ↑ | 0.4769 | ± | 0.0355 | |
- social sciences | 2 | none | acc | ↑ | 0.4333 | ± | 0.0361 | |
- stem | 2 | none | acc | ↑ | 0.4316 | ± | 0.0290 |
vllm (pretrained=/root/autodl-tmp/80-256,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.584 | ± | 0.0312 |
strict-match | 5 | exact_match | ↑ | 0.584 | ± | 0.0312 |
vllm (pretrained=/root/autodl-tmp/80-256,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.590 | ± | 0.022 |
strict-match | 5 | exact_match | ↑ | 0.586 | ± | 0.022 |
vllm (pretrained=/root/autodl-tmp/80-256,add_bos_token=true,max_model_len=3048,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto
Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
mmlu | 2 | none | acc | ↑ | 0.4246 | ± | 0.0165 | |
- humanities | 2 | none | acc | ↑ | 0.3795 | ± | 0.0336 | |
- other | 2 | none | acc | ↑ | 0.4872 | ± | 0.0356 | |
- social sciences | 2 | none | acc | ↑ | 0.4333 | ± | 0.0360 | |
- stem | 2 | none | acc | ↑ | 0.4070 | ± | 0.0282 |
vllm (pretrained=/root/autodl-tmp/80-512,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.604 | ± | 0.031 |
strict-match | 5 | exact_match | ↑ | 0.600 | ± | 0.031 |
vllm (pretrained=/root/autodl-tmp/80-512,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.594 | ± | 0.022 |
strict-match | 5 | exact_match | ↑ | 0.586 | ± | 0.022 |
vllm (pretrained=/root/autodl-tmp/80-512,add_bos_token=true,max_model_len=3048,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto
Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
mmlu | 2 | none | acc | ↑ | 0.4316 | ± | 0.0166 | |
- humanities | 2 | none | acc | ↑ | 0.4000 | ± | 0.0341 | |
- other | 2 | none | acc | ↑ | 0.4821 | ± | 0.0355 | |
- social sciences | 2 | none | acc | ↑ | 0.4278 | ± | 0.0356 | |
- stem | 2 | none | acc | ↑ | 0.4211 | ± | 0.0289 |
- Downloads last month
- 2
Model tree for noneUsername/Seed-Coder-8B-Instruct-W8A8
Base model
ByteDance-Seed/Seed-Coder-8B-Base