noneUsername's picture
Create README.md
e06e9ae verified
metadata
base_model:
  - shisa-ai/shisa-v2-unphi4-14b

vllm (pretrained=/root/autodl-tmp/shisa-v2-unphi4-14b,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.916 ± 0.0176
strict-match 5 exact_match 0.916 ± 0.0176

vllm (pretrained=/root/autodl-tmp/shisa-v2-unphi4-14b,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.928 ± 0.0116
strict-match 5 exact_match 0.928 ± 0.0116

vllm (pretrained=/root/autodl-tmp/shisa-v2-unphi4-14b,add_bos_token=true,max_model_len=3048,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc 0.7708 ± 0.0137
- humanities 2 none acc 0.8205 ± 0.0261
- other 2 none acc 0.7590 ± 0.0293
- social sciences 2 none acc 0.8167 ± 0.0280
- stem 2 none acc 0.7158 ± 0.0255

vllm (pretrained=/root/autodl-tmp/shisa-v2-unphi-14b-W8A8-INT8,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.908 ± 0.0183
strict-match 5 exact_match 0.908 ± 0.0183

vllm (pretrained=/root/autodl-tmp/shisa-v2-unphi-14b-W8A8-INT8,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.916 ± 0.0124
strict-match 5 exact_match 0.916 ± 0.0124

vllm (pretrained=/root/autodl-tmp/shisa-v2-unphi-14b-W8A8-INT8,add_bos_token=true,max_model_len=3048,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc 0.7673 ± 0.0136
- humanities 2 none acc 0.8205 ± 0.0260
- other 2 none acc 0.7590 ± 0.0285
- social sciences 2 none acc 0.8111 ± 0.0286
- stem 2 none acc 0.7088 ± 0.0253

vllm (pretrained=/root/autodl-tmp/8625-01-512,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.924 ± 0.0168
strict-match 5 exact_match 0.924 ± 0.0168

vllm (pretrained=/root/autodl-tmp/8625-01-512,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.928 ± 0.0116
strict-match 5 exact_match 0.928 ± 0.0116

vllm (pretrained=/root/autodl-tmp/8625-01-512,add_bos_token=true,max_model_len=3048,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc 0.7708 ± 0.0136
- humanities 2 none acc 0.8103 ± 0.0269
- other 2 none acc 0.7692 ± 0.0287
- social sciences 2 none acc 0.8222 ± 0.0278
- stem 2 none acc 0.7123 ± 0.0254