vllm (pretrained=/root/autodl-tmp/AM-Thinking-v1,add_bos_token=true,max_model_len=5096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.792 ± 0.0257
strict-match 5 exact_match ↑ 0.780 ± 0.0263

vllm (pretrained=/root/autodl-tmp/AM-Thinking-v1,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.798 ± 0.0180
strict-match 5 exact_match ↑ 0.786 ± 0.0184

vllm (pretrained=/root/autodl-tmp/AM-Thinking-v1,add_bos_token=true,max_model_len=3048,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc ↑ 0.8023 ± 0.0131
- humanities 2 none acc ↑ 0.8154 ± 0.0276
- other 2 none acc ↑ 0.8000 ± 0.0276
- social sciences 2 none acc ↑ 0.8556 ± 0.0255
- stem 2 none acc ↑ 0.7614 ± 0.0237

vllm (pretrained=/root/autodl-tmp/AM-Thinking-v1-awq,add_bos_token=true,max_model_len=5096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.820 ± 0.0243
strict-match 5 exact_match ↑ 0.816 ± 0.0246

vllm (pretrained=/root/autodl-tmp/AM-Thinking-v1-awq,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.816 ± 0.0173
strict-match 5 exact_match ↑ 0.814 ± 0.0174

vllm (pretrained=/root/autodl-tmp/AM-Thinking-v1-awq,add_bos_token=true,max_model_len=3048,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc ↑ 0.7930 ± 0.0132
- humanities 2 none acc ↑ 0.8051 ± 0.0278
- other 2 none acc ↑ 0.7846 ± 0.0277
- social sciences 2 none acc ↑ 0.8444 ± 0.0261
- stem 2 none acc ↑ 0.7579 ± 0.0242
Downloads last month
3
Safetensors
Model size
5.73B params
Tensor type
I32
·
BF16
·
FP16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for noneUsername/AM-Thinking-v1-awq

Quantized
(5)
this model