Text Generation
GGUF
English
Chinese
MOE
Qwen 2.5 MOE
Mixture of Experts
Uncensored
2X1.5B
deepseek
reasoning
thinking
creative
128k context
general usage
problem solving
brainstorming
solve riddles
story generation
plot generation
storytelling
fiction story
story
writing
fiction
Qwen 2.5
mergekit
conversational
Update README.md
Browse files
README.md
CHANGED
@@ -7,7 +7,7 @@ tags:
|
|
7 |
- MOE
|
8 |
- Qwen 2.5 MOE
|
9 |
- Mixture of Experts
|
10 |
-
-
|
11 |
- deepseek
|
12 |
- reasoning
|
13 |
- thinking
|
@@ -29,10 +29,14 @@ tags:
|
|
29 |
pipeline_tag: text-generation
|
30 |
---
|
31 |
|
32 |
-
<H2>Qwen2.5-MOE-2X1.5B-DeepSeek-Uncensored-Censored-
|
33 |
|
34 |
This is a Qwen2.5 MOE (Mixture of Experts) model comprised of TWO Qwen 2.5 Deepseek (Censored/Normal AND Uncensored) 1.5B models
|
35 |
-
creating a
|
|
|
|
|
|
|
|
|
36 |
|
37 |
This model can be used for all use cases, and is also (mostly) uncensored.
|
38 |
|
@@ -45,24 +49,22 @@ In Lmstudio the "Jinja Template" should load by default.
|
|
45 |
|
46 |
In other apps - use the Deepseek Tokenizer and/or "Jinja Template".
|
47 |
|
48 |
-
|
49 |
|
50 |
-
|
|
|
51 |
|
52 |
-
|
|
|
53 |
|
54 |
-
|
55 |
-
|
56 |
-
Four examples below so you have some idea what this model can do.
|
57 |
|
58 |
-
|
59 |
|
60 |
-
|
61 |
-
|
62 |
-
You may want to set 6/6 experts for best results.
|
63 |
|
64 |
-
|
65 |
-
and oddly seemed to add some new ones (? - Chinese Char/Symb thinking.).
|
66 |
|
67 |
Temp of .4 to .8 is suggested, however it will still operate at much higher temps like 1.8, 2.6 etc.
|
68 |
|
@@ -77,7 +79,9 @@ Likewise it may function better if you breakdown the reasoning/thinking task(s)
|
|
77 |
|
78 |
Also set context limit at 4k minimum, 8K+ suggested.
|
79 |
|
80 |
-
|
|
|
|
|
81 |
|
82 |
---
|
83 |
|
|
|
7 |
- MOE
|
8 |
- Qwen 2.5 MOE
|
9 |
- Mixture of Experts
|
10 |
+
- 2X1.5B
|
11 |
- deepseek
|
12 |
- reasoning
|
13 |
- thinking
|
|
|
29 |
pipeline_tag: text-generation
|
30 |
---
|
31 |
|
32 |
+
<H2>Qwen2.5-MOE-2X1.5B-DeepSeek-Uncensored-Censored-4B-gguf</H2>
|
33 |
|
34 |
This is a Qwen2.5 MOE (Mixture of Experts) model comprised of TWO Qwen 2.5 Deepseek (Censored/Normal AND Uncensored) 1.5B models
|
35 |
+
creating a 4B model with the "Uncensored" version of Deepseek Qwen 2.5 1.5B "in charge" so to speak.
|
36 |
+
|
37 |
+
The model is just over 4B because of the unqiue "shared expert" (roughly 2.5 models here) used in Qwen MOEs.
|
38 |
+
|
39 |
+
The oddball configuration yields interesting "thinking/reasoning" which is stronger than either 1.5B model on its own.
|
40 |
|
41 |
This model can be used for all use cases, and is also (mostly) uncensored.
|
42 |
|
|
|
49 |
|
50 |
In other apps - use the Deepseek Tokenizer and/or "Jinja Template".
|
51 |
|
52 |
+
This model contains 2 times the power of DeepSeek Distill reasoning/thinking and shows exceptional performance for a model of its size.
|
53 |
|
54 |
+
Be aware however, because this model (and it's core models) are so small, certain information may not be understood by the model - IE
|
55 |
+
culture references.
|
56 |
|
57 |
+
In such cases, you may want to provide the model with a more detailed prompt, with information about "references", so it can add this into
|
58 |
+
the reasoning/thinking process.
|
59 |
|
60 |
+
Also, the DeepSeek Qwen 1.5B model is based on Qwen's 1.5B Math model so this model is slanted more towards math/logic problem solving
|
61 |
+
and I would also say more "sciency" too.
|
|
|
62 |
|
63 |
+
This does not mean it will not work for your use case.
|
64 |
|
65 |
+
Four examples below so you have some idea what this model can do.
|
|
|
|
|
66 |
|
67 |
+
Keep in mind this model is two 1.5B parameters models working together, and will not have the power of a 14B or 32B reasoning/thinking model.
|
|
|
68 |
|
69 |
Temp of .4 to .8 is suggested, however it will still operate at much higher temps like 1.8, 2.6 etc.
|
70 |
|
|
|
79 |
|
80 |
Also set context limit at 4k minimum, 8K+ suggested.
|
81 |
|
82 |
+
I also suggest quant of IQ4/Q4 or higher, as larger quants will reasoning/thinking and perform much better.
|
83 |
+
|
84 |
+
If you can run Q6/Q8, please use these ones.
|
85 |
|
86 |
---
|
87 |
|