noneUsername's picture
Create README.md
14ff4aa verified
metadata
base_model:
  - ReadyArt/Forgotten-Safeword-24B-V3.0

vllm (pretrained=/root/autodl-tmp/Forgotten-Safeword-24B-V3.0,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.92 ± 0.0172
strict-match 5 exact_match 0.92 ± 0.0172

vllm (pretrained=/root/autodl-tmp/Forgotten-Safeword-24B-V3.0,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.912 ± 0.0127
strict-match 5 exact_match 0.904 ± 0.0132

vllm (pretrained=/root/autodl-tmp/Forgotten-Safeword-24B-V3.0,add_bos_token=true,max_model_len=4096,dtype=bfloat16,max_num_seqs=3), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc 0.7953 ± 0.0131
- humanities 2 none acc 0.8000 ± 0.0270
- other 2 none acc 0.8051 ± 0.0272
- social sciences 2 none acc 0.8667 ± 0.0244
- stem 2 none acc 0.7404 ± 0.0248

vllm (pretrained=/root/autodl-tmp/70-128-df10,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.888 ± 0.0200
strict-match 5 exact_match 0.884 ± 0.0203

vllm (pretrained=/root/autodl-tmp/70-128-df10,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.904 ± 0.0132
strict-match 5 exact_match 0.894 ± 0.0138

vllm (pretrained=/root/autodl-tmp/70-128-df10,add_bos_token=true,max_model_len=4096,dtype=bfloat16,max_num_seqs=3), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc 0.7860 ± 0.0131
- humanities 2 none acc 0.8051 ± 0.0252
- other 2 none acc 0.7846 ± 0.0276
- social sciences 2 none acc 0.8667 ± 0.0240
- stem 2 none acc 0.7228 ± 0.0255

vllm (pretrained=/root/autodl-tmp/86-512,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.94 ± 0.0151
strict-match 5 exact_match 0.94 ± 0.0151

vllm (pretrained=/root/autodl-tmp/86-512,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.920 ± 0.0121
strict-match 5 exact_match 0.916 ± 0.0124

vllm (pretrained=/root/autodl-tmp/86-512,add_bos_token=true,max_model_len=4096,dtype=bfloat16,max_num_seqs=3), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc 0.8012 ± 0.0130
- humanities 2 none acc 0.8000 ± 0.0267
- other 2 none acc 0.8000 ± 0.0275
- social sciences 2 none acc 0.8778 ± 0.0234
- stem 2 none acc 0.7544 ± 0.0246