DavidAU
/

Qwen2.5-MOE-2X1.5B-DeepSeek-Uncensored-Censored-4B-gguf

Model card Files Files and versions Community

DavidAU commited on Mar 4

Commit

4751748

verified ·

1 Parent(s): 3b245fc

Update README.md

Browse files

Files changed (1) hide show

README.md +20 -16

README.md CHANGED Viewed

@@ -7,7 +7,7 @@ tags:
 - MOE
 - Qwen 2.5 MOE
 - Mixture of Experts
-- 6X1.5B
 - deepseek
 - reasoning
 - thinking
@@ -29,10 +29,14 @@ tags:
 pipeline_tag: text-generation
 ---
-<H2>Qwen2.5-MOE-2X1.5B-DeepSeek-Uncensored-Censored-2.8B-gguf</H2>
 This is a Qwen2.5 MOE (Mixture of Experts) model comprised of TWO Qwen 2.5 Deepseek (Censored/Normal AND Uncensored) 1.5B models
-creating a 8.71B model with the "Uncensored" version of Deepseek Qwen 2.5 1.5B "in charge" so to speak.
 This model can be used for all use cases, and is also (mostly) uncensored.
@@ -45,24 +49,22 @@ In Lmstudio the "Jinja Template" should load by default.
 In other apps - use the Deepseek Tokenizer and/or "Jinja Template".
-Sometimes this model will output/think in Chinese Characters/Symbols (with an English prompt) - regen to clear.
-Sometimes it will work great, other times it will give "so/so" answers and then sometimes it will bat it out of the park, and past the "state line."
-And sometimes it will output well... less than acceptable.
-It is all over the map.
-Four examples below so you have some idea what this model can do.
-Keep in mind this model is six 1.5B parameters models working together, and will not have the power of a 14B or 32B reasoning/thinking model.
-Also, this model has 4/6 experts activated by default.
-You may want to set 6/6 experts for best results.
-This model is also mastered in Float 32, which helped overall model generation and addressed some model generation issues
-and oddly seemed to add some new ones (? - Chinese Char/Symb thinking.).
 Temp of .4 to .8 is suggested, however it will still operate at much higher temps like 1.8, 2.6 etc.
@@ -77,7 +79,9 @@ Likewise it may function better if you breakdown the reasoning/thinking task(s)
 Also set context limit at 4k minimum, 8K+ suggested.
-Quants uploaded: Q4_K_S, Q8_0
 ---

 - MOE
 - Qwen 2.5 MOE
 - Mixture of Experts
+- 2X1.5B
 - deepseek
 - reasoning
 - thinking
 pipeline_tag: text-generation
 ---
+<H2>Qwen2.5-MOE-2X1.5B-DeepSeek-Uncensored-Censored-4B-gguf</H2>
 This is a Qwen2.5 MOE (Mixture of Experts) model comprised of TWO Qwen 2.5 Deepseek (Censored/Normal AND Uncensored) 1.5B models
+creating a 4B model with the "Uncensored" version of Deepseek Qwen 2.5 1.5B "in charge" so to speak.
+The model is just over 4B because of the unqiue "shared expert" (roughly 2.5 models here) used in Qwen MOEs.
+The oddball configuration yields interesting "thinking/reasoning" which is stronger than either 1.5B model on its own.
 This model can be used for all use cases, and is also (mostly) uncensored.
 In other apps - use the Deepseek Tokenizer and/or "Jinja Template".
+This model contains 2 times the power of DeepSeek Distill reasoning/thinking and shows exceptional performance for a model of its size.
+Be aware however, because this model (and it's core models) are so small, certain information may not be understood by the model - IE
+culture references.
+In such cases, you may want to provide the model with a more detailed prompt, with information about "references", so it can add this into
+the reasoning/thinking process.
+Also, the DeepSeek Qwen 1.5B model is based on Qwen's 1.5B Math model so this model is slanted more towards math/logic problem solving
+and I would also say more "sciency" too.
+This does not mean it will not work for your use case.
+Four examples below so you have some idea what this model can do.
+Keep in mind this model is two 1.5B parameters models working together, and will not have the power of a 14B or 32B reasoning/thinking model.
 Temp of .4 to .8 is suggested, however it will still operate at much higher temps like 1.8, 2.6 etc.
 Also set context limit at 4k minimum, 8K+ suggested.
+I also suggest quant of IQ4/Q4 or higher, as larger quants will reasoning/thinking and perform much better.
+If you can run Q6/Q8, please use these ones.
 ---