|
|
--- |
|
|
base_model: |
|
|
- LatitudeGames/Muse-12B |
|
|
--- |
|
|
vllm (pretrained=/root/autodl-tmp/Muse-12B,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto |
|
|
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |
|
|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |
|
|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ | 0.68|± |0.0296| |
|
|
| | |strict-match | 5|exact_match|↑ | 0.68|± |0.0296| |
|
|
|
|
|
vllm (pretrained=/root/autodl-tmp/Muse-12B,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto |
|
|
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |
|
|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |
|
|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.678|± |0.0209| |
|
|
| | |strict-match | 5|exact_match|↑ |0.676|± |0.0210| |
|
|
|
|
|
vllm (pretrained=/root/autodl-tmp/Muse-12B,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1 |
|
|
| Groups |Version|Filter|n-shot|Metric| |Value | |Stderr| |
|
|
|------------------|------:|------|------|------|---|-----:|---|-----:| |
|
|
|mmlu | 2|none | |acc |↑ |0.6713|± |0.0150| |
|
|
| - humanities | 2|none | |acc |↑ |0.7026|± |0.0296| |
|
|
| - other | 2|none | |acc |↑ |0.6923|± |0.0323| |
|
|
| - social sciences| 2|none | |acc |↑ |0.7778|± |0.0294| |
|
|
| - stem | 2|none | |acc |↑ |0.5684|± |0.0279| |
|
|
|
|
|
|
|
|
vllm (pretrained=/root/autodl-tmp/Muse-12B-70-128-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto |
|
|
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |
|
|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |
|
|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.644|± |0.0303| |
|
|
| | |strict-match | 5|exact_match|↑ |0.644|± |0.0303| |
|
|
|
|
|
|
|
|
vllm (pretrained=/root/autodl-tmp/Muse-12B-86-128-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto |
|
|
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |
|
|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |
|
|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.644|± |0.0303| |
|
|
| | |strict-match | 5|exact_match|↑ |0.644|± |0.0303| |
|
|
|
|
|
|
|
|
vllm (pretrained=/root/autodl-tmp/Muse-12B-87-128-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto |
|
|
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |
|
|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |
|
|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.692|± |0.0293| |
|
|
| | |strict-match | 5|exact_match|↑ |0.688|± |0.0294| |
|
|
|
|
|
vllm (pretrained=/root/autodl-tmp/Muse-12B-87-128-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto |
|
|
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |
|
|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |
|
|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.668|± |0.0211| |
|
|
| | |strict-match | 5|exact_match|↑ |0.664|± |0.0211| |
|
|
|
|
|
llm (pretrained=/root/autodl-tmp/Muse-12B-87-128-3096,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1 |
|
|
| Groups |Version|Filter|n-shot|Metric| |Value | |Stderr| |
|
|
|------------------|------:|------|------|------|---|-----:|---|-----:| |
|
|
|mmlu | 2|none | |acc |↑ |0.6643|± |0.0151| |
|
|
| - humanities | 2|none | |acc |↑ |0.6872|± |0.0303| |
|
|
| - other | 2|none | |acc |↑ |0.6872|± |0.0321| |
|
|
| - social sciences| 2|none | |acc |↑ |0.7667|± |0.0301| |
|
|
| - stem | 2|none | |acc |↑ |0.5684|± |0.0277| |
|
|
|
|
|
|
|
|
vllm (pretrained=/root/autodl-tmp/Muse-12B-87-256-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto |
|
|
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |
|
|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |
|
|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.672|± |0.0298| |
|
|
| | |strict-match | 5|exact_match|↑ |0.676|± |0.0297| |
|
|
|
|
|
vllm (pretrained=/root/autodl-tmp/Muse-12B-87-256-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto |
|
|
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |
|
|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |
|
|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.686|± |0.0208| |
|
|
| | |strict-match | 5|exact_match|↑ |0.684|± |0.0208| |
|
|
|
|
|
vllm (pretrained=/root/autodl-tmp/Muse-12B-87-256-3096,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1 |
|
|
| Groups |Version|Filter|n-shot|Metric| |Value | |Stderr| |
|
|
|------------------|------:|------|------|------|---|-----:|---|-----:| |
|
|
|mmlu | 2|none | |acc |↑ |0.6620|± |0.0149| |
|
|
| - humanities | 2|none | |acc |↑ |0.6821|± |0.0303| |
|
|
| - other | 2|none | |acc |↑ |0.7026|± |0.0311| |
|
|
| - social sciences| 2|none | |acc |↑ |0.7667|± |0.0301| |
|
|
| - stem | 2|none | |acc |↑ |0.5544|± |0.0272| |
|
|
|
|
|
|
|
|
vllm (pretrained=/root/autodl-tmp/Muse-12B-875-128-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto |
|
|
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |
|
|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |
|
|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.672|± |0.0298| |
|
|
| | |strict-match | 5|exact_match|↑ |0.672|± |0.0298| |
|
|
|
|
|
|
|
|
vllm (pretrained=/root/autodl-tmp/Muse-12B-875-256-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto |
|
|
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |
|
|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |
|
|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.704|± |0.0289| |
|
|
| | |strict-match | 5|exact_match|↑ |0.708|± |0.0288| |
|
|
|
|
|
vllm (pretrained=/root/autodl-tmp/Muse-12B-875-256-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto |
|
|
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |
|
|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |
|
|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.690|± |0.0207| |
|
|
| | |strict-match | 5|exact_match|↑ |0.692|± |0.0207| |
|
|
|
|
|
vllm (pretrained=/root/autodl-tmp/Muse-12B-875-256-3096,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1 |
|
|
| Groups |Version|Filter|n-shot|Metric| |Value | |Stderr| |
|
|
|------------------|------:|------|------|------|---|-----:|---|-----:| |
|
|
|mmlu | 2|none | |acc |↑ |0.6585|± |0.0150| |
|
|
| - humanities | 2|none | |acc |↑ |0.6974|± |0.0300| |
|
|
| - other | 2|none | |acc |↑ |0.6718|± |0.0327| |
|
|
| - social sciences| 2|none | |acc |↑ |0.7833|± |0.0291| |
|
|
| - stem | 2|none | |acc |↑ |0.5439|± |0.0276| |
|
|
|
|
|
|
|
|
vllm (pretrained=/root/autodl-tmp/Muse-12B-876-128-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto |
|
|
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |
|
|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |
|
|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.656|± |0.0301| |
|
|
| | |strict-match | 5|exact_match|↑ |0.656|± |0.0301| |
|
|
|
|
|
|
|
|
vllm (pretrained=/root/autodl-tmp/Muse-12B-88-128-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto |
|
|
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |
|
|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |
|
|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.644|± |0.0303| |
|
|
| | |strict-match | 5|exact_match|↑ |0.648|± |0.0303| |
|
|
|
|
|
vllm (pretrained=/root/autodl-tmp/Muse-12B-90-128-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto |
|
|
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |
|
|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |
|
|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.664|± |0.0299| |
|
|
| | |strict-match | 5|exact_match|↑ |0.668|± |0.0298| |