license: apache-2.0
language:
- en
- zh
tags:
- MOE
- Qwen 2.5 MOE
- Mixture of Experts
- 2X1.5B
- deepseek
- reasoning
- thinking
- creative
- 128k context
- general usage
- problem solving
- brainstorming
- solve riddles
- story generation
- plot generation
- storytelling
- fiction story
- story
- writing
- fiction
- Qwen 2.5
- mergekit
pipeline_tag: text-generation
(quants uploading, examples to be added)
Qwen2.5-MOE-2X1.5B-DeepSeek-Uncensored-Censored-4B-gguf
This is a Qwen2.5 MOE (Mixture of Experts) model comprised of TWO Qwen 2.5 Deepseek (Censored/Normal AND Uncensored) 1.5B models creating a 4B model with the "Uncensored" version of Deepseek Qwen 2.5 1.5B "in charge" so to speak.
The model is just over 4B because of the unqiue "shared expert" (roughly 2.5 models here) used in Qwen MOEs.
The oddball configuration yields interesting "thinking/reasoning" which is stronger than either 1.5B model on its own.
This model can be used for all use cases, and is also (mostly) uncensored.
Context: 128k.
You need to use the "Jinja Template" encoded in the GGUF to use this model. You might be able to use Llama 3, and/or Chatml templates if your AI/LLM app can not access the "Jinja Template".
In Lmstudio the "Jinja Template" should load by default.
In other apps - use the Deepseek Tokenizer and/or "Jinja Template".
This model contains 2 times the power of DeepSeek Distill reasoning/thinking and shows exceptional performance for a model of its size.
Be aware however, because this model (and it's core models) are so small, certain information may not be understood by the model - IE culture references.
In such cases, you may want to provide the model with a more detailed prompt, with information about "references", so it can add this into the reasoning/thinking process.
Also, the DeepSeek Qwen 1.5B model is based on Qwen's 1.5B Math model so this model is slanted more towards math/logic problem solving and I would also say more "sciency" too.
This does not mean it will not work for your use case.
Four examples below so you have some idea what this model can do.
Keep in mind this model is two 1.5B parameters models working together, and will not have the power of a 14B or 32B reasoning/thinking model.
Temp of .4 to .8 is suggested, however it will still operate at much higher temps like 1.8, 2.6 etc.
Depending on your prompt change temp SLOWLY: IE: .41,.42,.43 ... etc etc.
Likewise, because these are small models, it may do a tonne of "thinking"/"reasoning" and then "forget" to finish a / the task(s). In this case, prompt the model to "Complete the task XYZ with the 'reasoning plan' above" .
Likewise it may function better if you breakdown the reasoning/thinking task(s) into smaller pieces :
"IE: Instead of asking for 6 plots FOR theme XYZ, ASK IT for ONE plot for theme XYZ at a time".
Also set context limit at 4k minimum, 8K+ suggested.
I also suggest quant of IQ4/Q4 or higher, as larger quants will reasoning/thinking and perform much better.
If you can run Q6/Q8, please use these ones.
Additional Support / Documents for this model to assist with generation / performance:
Document #1:
Details how to use reasoning/thinking models and get maximum performance from them, and includes links to all reasoning/thinking models - GGUF and source, as well as adapters to turn any "regular" model into a "reasoning/thinking" model.
[ https://huggingface.co/DavidAU/How-To-Use-Reasoning-Thinking-Models-and-Create-Them ]
Document #2:
Document detailing all parameters, settings, samplers and advanced samplers to use not only my models to their maximum potential - but all models (and quants) online (regardless of the repo) to their maximum potential. Included quick start and detailed notes, include AI / LLM apps and other critical information and references too. A must read if you are using any AI/LLM right now.
Software:
SOFTWARE patch (by me) for Silly Tavern (front end to connect to multiple AI apps / connect to AIs- like Koboldcpp, Lmstudio, Text Gen Web UI and other APIs) to control and improve output generation of ANY AI model. Also designed to control/wrangle some of my more "creative" models and make them perform perfectly with little to no parameter/samplers adjustments too.
Example Generation:
Q8_0 Quant, Temp 1.5, rep pen 1.1, topp: .95, minp: .05, topk: 40