DavidAU commited on
Commit
5fb9f4e
·
verified ·
1 Parent(s): 853cbfb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -41,7 +41,7 @@ creating a 15B model with the "Abliterated" (Uncensored) version of Deepseek Qwe
41
 
42
  The model is just over 15B because of the unqiue "shared expert" (roughly 2.5 models here) used in Qwen MOEs.
43
 
44
- The oddball configuration yields interesting "thinking/reasoning" which is stronger than either 1.5B model on its own.
45
 
46
  Example generations at the bottom of this page.
47
 
@@ -56,7 +56,7 @@ In Lmstudio the "Jinja Template" should load by default.
56
 
57
  In other apps - use the Deepseek Tokenizer and/or "Jinja Template".
58
 
59
- This model contains 2 times the power of DeepSeek Distill reasoning/thinking and shows exceptional performance.
60
 
61
  Also, the DeepSeek Qwen 7B model is based on Qwen's 7B Math model so this model is slanted more towards math/logic problem solving
62
  and I would also say more "sciency" too.
@@ -64,7 +64,7 @@ and I would also say more "sciency" too.
64
  This does not mean it will not work for your use case.
65
 
66
  Also, because of how this model works (uncensored and censored in the same model) you may want to try 1-4 generations depending
67
- on your use case because even the "right" response will vary widely, and in many cases be more "interesting".
68
 
69
  Examples below so you have some idea what this model can do.
70
 
 
41
 
42
  The model is just over 15B because of the unqiue "shared expert" (roughly 2.5 models here) used in Qwen MOEs.
43
 
44
+ The oddball configuration yields interesting "thinking/reasoning" which is stronger than either 7B model on its own.
45
 
46
  Example generations at the bottom of this page.
47
 
 
56
 
57
  In other apps - use the Deepseek Tokenizer and/or "Jinja Template".
58
 
59
+ This model contains 2 times the power of DeepSeek Distill 7B reasoning/thinking models and shows exceptional performance.
60
 
61
  Also, the DeepSeek Qwen 7B model is based on Qwen's 7B Math model so this model is slanted more towards math/logic problem solving
62
  and I would also say more "sciency" too.
 
64
  This does not mean it will not work for your use case.
65
 
66
  Also, because of how this model works (uncensored and censored in the same model) you may want to try 1-4 generations depending
67
+ on your use case because even the "right" response will vary widely, and in many cases may be more "interesting".
68
 
69
  Examples below so you have some idea what this model can do.
70