Details of Ability Loss
Original model:
vllm (pretrained=/root/autodl-tmp/Devstral-Small-2505,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | β | 0.864 | Β± | 0.0217 |
strict-match | 5 | exact_match | β | 0.860 | Β± | 0.0220 |
vllm (pretrained=/root/autodl-tmp/Devstral-Small-2505,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | β | 0.868 | Β± | 0.0152 |
strict-match | 5 | exact_match | β | 0.864 | Β± | 0.0153 |
vllm (pretrained=/root/autodl-tmp/Devstral-Small-2505,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1
Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
mmlu | 2 | none | acc | β | 0.7965 | Β± | 0.0129 | |
- humanities | 2 | none | acc | β | 0.8205 | Β± | 0.0244 | |
- other | 2 | none | acc | β | 0.8308 | Β± | 0.0259 | |
- social sciences | 2 | none | acc | β | 0.8444 | Β± | 0.0261 | |
- stem | 2 | none | acc | β | 0.7263 | Β± | 0.0252 |
Final W8A8 quantization model:
vllm (pretrained=/root/autodl-tmp/87-128-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | β | 0.860 | Β± | 0.0220 |
strict-match | 5 | exact_match | β | 0.856 | Β± | 0.0222 |
vllm (pretrained=/root/autodl-tmp/87-128-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | β | 0.85 | Β± | 0.0160 |
strict-match | 5 | exact_match | β | 0.84 | Β± | 0.0164 |
vllm (pretrained=/root/autodl-tmp/87-128-3096,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1
Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
mmlu | 2 | none | acc | β | 0.7509 | Β± | 0.0139 | |
- humanities | 2 | none | acc | β | 0.7949 | Β± | 0.0261 | |
- other | 2 | none | acc | β | 0.7641 | Β± | 0.0287 | |
- social sciences | 2 | none | acc | β | 0.8167 | Β± | 0.0285 | |
- stem | 2 | none | acc | β | 0.6702 | Β± | 0.0268 |
0.860->0.856: β0.004(0.05%)
0.864->0.84: β0.024(2.8%)
0.7965->0.7509: β0.0456(5.73%)