DavidAU commited on
Commit
4751748
·
verified ·
1 Parent(s): 3b245fc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -16
README.md CHANGED
@@ -7,7 +7,7 @@ tags:
7
  - MOE
8
  - Qwen 2.5 MOE
9
  - Mixture of Experts
10
- - 6X1.5B
11
  - deepseek
12
  - reasoning
13
  - thinking
@@ -29,10 +29,14 @@ tags:
29
  pipeline_tag: text-generation
30
  ---
31
 
32
- <H2>Qwen2.5-MOE-2X1.5B-DeepSeek-Uncensored-Censored-2.8B-gguf</H2>
33
 
34
  This is a Qwen2.5 MOE (Mixture of Experts) model comprised of TWO Qwen 2.5 Deepseek (Censored/Normal AND Uncensored) 1.5B models
35
- creating a 8.71B model with the "Uncensored" version of Deepseek Qwen 2.5 1.5B "in charge" so to speak.
 
 
 
 
36
 
37
  This model can be used for all use cases, and is also (mostly) uncensored.
38
 
@@ -45,24 +49,22 @@ In Lmstudio the "Jinja Template" should load by default.
45
 
46
  In other apps - use the Deepseek Tokenizer and/or "Jinja Template".
47
 
48
- Sometimes this model will output/think in Chinese Characters/Symbols (with an English prompt) - regen to clear.
49
 
50
- Sometimes it will work great, other times it will give "so/so" answers and then sometimes it will bat it out of the park, and past the "state line."
 
51
 
52
- And sometimes it will output well... less than acceptable.
 
53
 
54
- It is all over the map.
55
-
56
- Four examples below so you have some idea what this model can do.
57
 
58
- Keep in mind this model is six 1.5B parameters models working together, and will not have the power of a 14B or 32B reasoning/thinking model.
59
 
60
- Also, this model has 4/6 experts activated by default.
61
-
62
- You may want to set 6/6 experts for best results.
63
 
64
- This model is also mastered in Float 32, which helped overall model generation and addressed some model generation issues
65
- and oddly seemed to add some new ones (? - Chinese Char/Symb thinking.).
66
 
67
  Temp of .4 to .8 is suggested, however it will still operate at much higher temps like 1.8, 2.6 etc.
68
 
@@ -77,7 +79,9 @@ Likewise it may function better if you breakdown the reasoning/thinking task(s)
77
 
78
  Also set context limit at 4k minimum, 8K+ suggested.
79
 
80
- Quants uploaded: Q4_K_S, Q8_0
 
 
81
 
82
  ---
83
 
 
7
  - MOE
8
  - Qwen 2.5 MOE
9
  - Mixture of Experts
10
+ - 2X1.5B
11
  - deepseek
12
  - reasoning
13
  - thinking
 
29
  pipeline_tag: text-generation
30
  ---
31
 
32
+ <H2>Qwen2.5-MOE-2X1.5B-DeepSeek-Uncensored-Censored-4B-gguf</H2>
33
 
34
  This is a Qwen2.5 MOE (Mixture of Experts) model comprised of TWO Qwen 2.5 Deepseek (Censored/Normal AND Uncensored) 1.5B models
35
+ creating a 4B model with the "Uncensored" version of Deepseek Qwen 2.5 1.5B "in charge" so to speak.
36
+
37
+ The model is just over 4B because of the unqiue "shared expert" (roughly 2.5 models here) used in Qwen MOEs.
38
+
39
+ The oddball configuration yields interesting "thinking/reasoning" which is stronger than either 1.5B model on its own.
40
 
41
  This model can be used for all use cases, and is also (mostly) uncensored.
42
 
 
49
 
50
  In other apps - use the Deepseek Tokenizer and/or "Jinja Template".
51
 
52
+ This model contains 2 times the power of DeepSeek Distill reasoning/thinking and shows exceptional performance for a model of its size.
53
 
54
+ Be aware however, because this model (and it's core models) are so small, certain information may not be understood by the model - IE
55
+ culture references.
56
 
57
+ In such cases, you may want to provide the model with a more detailed prompt, with information about "references", so it can add this into
58
+ the reasoning/thinking process.
59
 
60
+ Also, the DeepSeek Qwen 1.5B model is based on Qwen's 1.5B Math model so this model is slanted more towards math/logic problem solving
61
+ and I would also say more "sciency" too.
 
62
 
63
+ This does not mean it will not work for your use case.
64
 
65
+ Four examples below so you have some idea what this model can do.
 
 
66
 
67
+ Keep in mind this model is two 1.5B parameters models working together, and will not have the power of a 14B or 32B reasoning/thinking model.
 
68
 
69
  Temp of .4 to .8 is suggested, however it will still operate at much higher temps like 1.8, 2.6 etc.
70
 
 
79
 
80
  Also set context limit at 4k minimum, 8K+ suggested.
81
 
82
+ I also suggest quant of IQ4/Q4 or higher, as larger quants will reasoning/thinking and perform much better.
83
+
84
+ If you can run Q6/Q8, please use these ones.
85
 
86
  ---
87