DavidAU's picture
Update README.md
36aad05 verified
|
raw
history blame
5.59 kB
metadata
license: apache-2.0
language:
  - en
  - zh
tags:
  - MOE
  - Qwen 2.5 MOE
  - Mixture of Experts
  - 2X1.5B
  - deepseek
  - reasoning
  - thinking
  - creative
  - 128k context
  - general usage
  - problem solving
  - brainstorming
  - solve riddles
  - story generation
  - plot generation
  - storytelling
  - fiction story
  - story
  - writing
  - fiction
  - Qwen 2.5
  - mergekit
pipeline_tag: text-generation

(quants uploading, examples to be added)

Qwen2.5-MOE-2X1.5B-DeepSeek-Uncensored-Censored-4B-gguf

This is a Qwen2.5 MOE (Mixture of Experts) model comprised of TWO Qwen 2.5 Deepseek (Censored/Normal AND Uncensored) 1.5B models creating a 4B model with the "Uncensored" version of Deepseek Qwen 2.5 1.5B "in charge" so to speak.

The model is just over 4B because of the unqiue "shared expert" (roughly 2.5 models here) used in Qwen MOEs.

The oddball configuration yields interesting "thinking/reasoning" which is stronger than either 1.5B model on its own.

This model can be used for all use cases, and is also (mostly) uncensored.

Context: 128k.

You need to use the "Jinja Template" encoded in the GGUF to use this model. You might be able to use Llama 3, and/or Chatml templates if your AI/LLM app can not access the "Jinja Template".

In Lmstudio the "Jinja Template" should load by default.

In other apps - use the Deepseek Tokenizer and/or "Jinja Template".

This model contains 2 times the power of DeepSeek Distill reasoning/thinking and shows exceptional performance for a model of its size.

Be aware however, because this model (and it's core models) are so small, certain information may not be understood by the model - IE culture references.

In such cases, you may want to provide the model with a more detailed prompt, with information about "references", so it can add this into the reasoning/thinking process.

Also, the DeepSeek Qwen 1.5B model is based on Qwen's 1.5B Math model so this model is slanted more towards math/logic problem solving and I would also say more "sciency" too.

This does not mean it will not work for your use case.

Likewise, this model may require more direction, details, and what you are asking in the prompt to "think" along "narrower" lines.

It may take 2-4 generations for the model to zero in / get what you mean and "think" along the correct lines, if your prompt(s) are too short.

Example:

"Come up with six plots for a new "Star Trek" episode (that the audience would love) that all involve time travel."

VS

"Come up with six story plots for a new "Star Trek" (science fiction tv series, set in the 23 century) episode that all involve time travel."

The first prompt MAY generate the correct response (after 1-4 tries), whereas the 2nd one will always work.

Also, because of how this model works (uncensored and censored in the same model) you may want to try 1-4 generations depending on your use case because even the "right" response will vary widely, and in many cases be more "interesting".

Four examples below so you have some idea what this model can do.

Keep in mind this model is two 1.5B parameters models working together, and will not have the power of a 14B or 32B reasoning/thinking model.

Temp of .4 to .8 is suggested (for best reasoning/thinking), however it will still operate at much higher temps like 1.8, 2.6 etc.

Depending on your prompt change temp SLOWLY: IE: .41,.42,.43 ... etc etc.

Likewise, because these are small models, it may do a tonne of "thinking"/"reasoning" and then "forget" to finish a / the task(s). In this case, prompt the model to "Complete the task XYZ with the 'reasoning plan' above" .

Likewise it may function better if you breakdown the reasoning/thinking task(s) into smaller pieces :

"IE: Instead of asking for 6 plots FOR theme XYZ, ASK IT for ONE plot for theme XYZ at a time".

Also set context limit at 4k minimum, 8K+ suggested.

I also suggest quant of IQ4/Q4 or higher, as larger quants will reasoning/thinking and perform much better.

If you can run Q6/Q8, please use these ones.


Additional Support / Documents for this model to assist with generation / performance:

Document #1:

Details how to use reasoning/thinking models and get maximum performance from them, and includes links to all reasoning/thinking models - GGUF and source, as well as adapters to turn any "regular" model into a "reasoning/thinking" model.

[ https://huggingface.co/DavidAU/How-To-Use-Reasoning-Thinking-Models-and-Create-Them ]

Document #2:

Document detailing all parameters, settings, samplers and advanced samplers to use not only my models to their maximum potential - but all models (and quants) online (regardless of the repo) to their maximum potential. Included quick start and detailed notes, include AI / LLM apps and other critical information and references too. A must read if you are using any AI/LLM right now.

[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]

Software:

SOFTWARE patch (by me) for Silly Tavern (front end to connect to multiple AI apps / connect to AIs- like Koboldcpp, Lmstudio, Text Gen Web UI and other APIs) to control and improve output generation of ANY AI model. Also designed to control/wrangle some of my more "creative" models and make them perform perfectly with little to no parameter/samplers adjustments too.

[ https://huggingface.co/DavidAU/AI_Autocorrect__Auto-Creative-Enhancement__Auto-Low-Quant-Optimization__gguf-exl2-hqq-SOFTWARE ]


Example Generation:

Q8_0 Quant, Temp 1.5, rep pen 1.06, topp: .95, minp: .05, topk: 40