ShinojiResearch
/

Senku-70B-Full

Generated from Trainer

Model card Files Files and versions Community

fblgit commited on Feb 18

Commit

0608844

•

1 Parent(s): ae3a42f

Update README.md

Files changed (1) hide show

README.md +13 -8

README.md CHANGED Viewed

@@ -4,7 +4,7 @@ tags:
 - generated_from_trainer
 base_model: 152334H/miqu-1-70b-sf
 model-index:
-- name: qlora-out
   results: []
 license: cc0-1.0
 datasets:
@@ -13,16 +13,20 @@ datasets:
 # ShinojiResearch/Senku-70B-Full
-## Model Details
 Finetune of miqu-70b-sf dequant of miqudev's leak of Mistral-70B (allegedly an early mistral medium). My diffs are available under CC-0 (That is the Senku-70B repo, full includes the merge), this is a merge with the leaked model, you can use the other repository to save bandwidth.
-EQ-Bench: 84.89
-GSM8k: 77.18 (71.04 when using ChatML)
-Hellaswag: 87.67
-Edit: Upon further testing a score of 85.09 was achieved using ChatML instead of Mistral's prompt.
 I recommend using the ChatML format instead, I will run more benchmarks. This also fixes the bug with Miqu dequant failing to provide a stop.
 <|im_start|>system
 Provide some context and/or instructions to the model.
 <|im_end|>
@@ -30,11 +34,12 @@ Provide some context and/or instructions to the model.
 The user’s message goes here
 <|im_end|>
 <|im_start|>assistant <|im_end|>
-Credit to https://twitter.com/hu_yifei for providing GSM & Hellaswag. It is the first open weight model to dethrone GPT-4 on EQ bench,
 ## Base Model Details
 This model is a fine-tuned version of [152334H/miqu-1-70b-sf](https://huggingface.co/152334H/miqu-1-70b-sf) on the Slimorca dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.3110

 - generated_from_trainer
 base_model: 152334H/miqu-1-70b-sf
 model-index:
+- name: Senku-70B-Full
   results: []
 license: cc0-1.0
 datasets:
 # ShinojiResearch/Senku-70B-Full
+[<img src="https://cdna.artstation.com/p/assets/images/images/034/109/324/large/bella-factor-senku-ishigami.jpg?1611427638" width="420">](Senku-70B-Full)
+## UPDATE: **85.09** EQ-Bench with ChatML teamplate
+* EQ-Bench: (Mistral) *84.89* -> **85.09** (ChatML)
+* GSM8k: (Mistral) *77.18* -> **71.04** (ChatML)
+* Hellaswag: (Mistral) 87.67 -> ??
 Finetune of miqu-70b-sf dequant of miqudev's leak of Mistral-70B (allegedly an early mistral medium). My diffs are available under CC-0 (That is the Senku-70B repo, full includes the merge), this is a merge with the leaked model, you can use the other repository to save bandwidth.
+**Update**: Upon further testing a score of **85.09** was achieved using ChatML instead of Mistral's prompt.
+## Prompt Template
 I recommend using the ChatML format instead, I will run more benchmarks. This also fixes the bug with Miqu dequant failing to provide a stop.
+```
 <|im_start|>system
 Provide some context and/or instructions to the model.
 <|im_end|>
 The user’s message goes here
 <|im_end|>
 <|im_start|>assistant <|im_end|>
+```
+## Kudos
+`Credit to https://twitter.com/hu_yifei for providing GSM & Hellaswag. It is the first open weight model to dethrone GPT-4 on EQ bench.`
 ## Base Model Details
 This model is a fine-tuned version of [152334H/miqu-1-70b-sf](https://huggingface.co/152334H/miqu-1-70b-sf) on the Slimorca dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.3110