Muse-12B-W8A8 / README.md
noneUsername's picture
Create README.md
b5467d3 verified
---
base_model:
- LatitudeGames/Muse-12B
---
vllm (pretrained=/root/autodl-tmp/Muse-12B,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ | 0.68|± |0.0296|
| | |strict-match | 5|exact_match|↑ | 0.68|± |0.0296|
vllm (pretrained=/root/autodl-tmp/Muse-12B,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.678|± |0.0209|
| | |strict-match | 5|exact_match|↑ |0.676|± |0.0210|
vllm (pretrained=/root/autodl-tmp/Muse-12B,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1
| Groups |Version|Filter|n-shot|Metric| |Value | |Stderr|
|------------------|------:|------|------|------|---|-----:|---|-----:|
|mmlu | 2|none | |acc |↑ |0.6713|± |0.0150|
| - humanities | 2|none | |acc |↑ |0.7026|± |0.0296|
| - other | 2|none | |acc |↑ |0.6923|± |0.0323|
| - social sciences| 2|none | |acc |↑ |0.7778|± |0.0294|
| - stem | 2|none | |acc |↑ |0.5684|± |0.0279|
vllm (pretrained=/root/autodl-tmp/Muse-12B-70-128-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.644|± |0.0303|
| | |strict-match | 5|exact_match|↑ |0.644|± |0.0303|
vllm (pretrained=/root/autodl-tmp/Muse-12B-86-128-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.644|± |0.0303|
| | |strict-match | 5|exact_match|↑ |0.644|± |0.0303|
vllm (pretrained=/root/autodl-tmp/Muse-12B-87-128-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.692|± |0.0293|
| | |strict-match | 5|exact_match|↑ |0.688|± |0.0294|
vllm (pretrained=/root/autodl-tmp/Muse-12B-87-128-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.668|± |0.0211|
| | |strict-match | 5|exact_match|↑ |0.664|± |0.0211|
llm (pretrained=/root/autodl-tmp/Muse-12B-87-128-3096,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1
| Groups |Version|Filter|n-shot|Metric| |Value | |Stderr|
|------------------|------:|------|------|------|---|-----:|---|-----:|
|mmlu | 2|none | |acc |↑ |0.6643|± |0.0151|
| - humanities | 2|none | |acc |↑ |0.6872|± |0.0303|
| - other | 2|none | |acc |↑ |0.6872|± |0.0321|
| - social sciences| 2|none | |acc |↑ |0.7667|± |0.0301|
| - stem | 2|none | |acc |↑ |0.5684|± |0.0277|
vllm (pretrained=/root/autodl-tmp/Muse-12B-87-256-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.672|± |0.0298|
| | |strict-match | 5|exact_match|↑ |0.676|± |0.0297|
vllm (pretrained=/root/autodl-tmp/Muse-12B-87-256-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.686|± |0.0208|
| | |strict-match | 5|exact_match|↑ |0.684|± |0.0208|
vllm (pretrained=/root/autodl-tmp/Muse-12B-87-256-3096,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1
| Groups |Version|Filter|n-shot|Metric| |Value | |Stderr|
|------------------|------:|------|------|------|---|-----:|---|-----:|
|mmlu | 2|none | |acc |↑ |0.6620|± |0.0149|
| - humanities | 2|none | |acc |↑ |0.6821|± |0.0303|
| - other | 2|none | |acc |↑ |0.7026|± |0.0311|
| - social sciences| 2|none | |acc |↑ |0.7667|± |0.0301|
| - stem | 2|none | |acc |↑ |0.5544|± |0.0272|
vllm (pretrained=/root/autodl-tmp/Muse-12B-875-128-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.672|± |0.0298|
| | |strict-match | 5|exact_match|↑ |0.672|± |0.0298|
vllm (pretrained=/root/autodl-tmp/Muse-12B-875-256-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.704|± |0.0289|
| | |strict-match | 5|exact_match|↑ |0.708|± |0.0288|
vllm (pretrained=/root/autodl-tmp/Muse-12B-875-256-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.690|± |0.0207|
| | |strict-match | 5|exact_match|↑ |0.692|± |0.0207|
vllm (pretrained=/root/autodl-tmp/Muse-12B-875-256-3096,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1
| Groups |Version|Filter|n-shot|Metric| |Value | |Stderr|
|------------------|------:|------|------|------|---|-----:|---|-----:|
|mmlu | 2|none | |acc |↑ |0.6585|± |0.0150|
| - humanities | 2|none | |acc |↑ |0.6974|± |0.0300|
| - other | 2|none | |acc |↑ |0.6718|± |0.0327|
| - social sciences| 2|none | |acc |↑ |0.7833|± |0.0291|
| - stem | 2|none | |acc |↑ |0.5439|± |0.0276|
vllm (pretrained=/root/autodl-tmp/Muse-12B-876-128-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.656|± |0.0301|
| | |strict-match | 5|exact_match|↑ |0.656|± |0.0301|
vllm (pretrained=/root/autodl-tmp/Muse-12B-88-128-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.644|± |0.0303|
| | |strict-match | 5|exact_match|↑ |0.648|± |0.0303|
vllm (pretrained=/root/autodl-tmp/Muse-12B-90-128-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.664|± |0.0299|
| | |strict-match | 5|exact_match|↑ |0.668|± |0.0298|