DavidAU/Mistral-2x22B-MOE-Power-Codestral-Ultimate-39B

Mistral-2x22B-MOE-Power-Codestral-Ultimate-39B

This repo contains the full precision source code, in "safe tensors" format to generate GGUFs, GPTQ, EXL2, AWQ, HQQ and other formats. The source code can also be used directly.

The two best Mistral Coders at 22B in one that are stronger than the sum of their parts in MOE (Mixture of Experts) 2x22B configuration.

Both models code together, and work together.

Manually coded in "Jinja" template (CHATML) not available during org model release.

NOTE: Specialized, and optimized GGUF repo to follow.

ABOUT CODESTRAL 22B:

Codestral-22B-v0.1 is trained on a diverse dataset of 80+ programming languages, including the most popular ones, such as Python, Java, C, C++, JavaScript, and Bash (more details in the Blogpost). The model can be queried:

As instruct, for instance to answer any questions about a code snippet (write documentation, explain, factorize) or to generate code following specific indications
As Fill in the Middle (FIM), to predict the middle tokens between a prefix and a suffix (very useful for software development add-ons like in VS Code)

Limitations

The Codestral-22B-v0.1 does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.

License

Codestral-22B-v0.1 is released under the MNLP-0.1 license.

ABOUT Trinity-2-Codestral-22B:

Trinity is a coding specific Large Language Model series created by Migel Tissera.

Mistral-2x22B-MOE-Power-Codestral-Ultimate-39B

MOE MODEL - settings, details:

Max context: 32k.

Super special thanks to MistralAI and Migtissera for making such fantastic models.

Suggested Settings:

Temp .5 to .7 (or lower)
topk: 20, topp: .8, minp: .05 (topp, minp can be .95 and .05)
rep pen: 1.1 (can be lower; lower may generate better code; specifically 1.02, 1.03 and 1.05)
CHATML template OR Jinja Template.
A System Prompt is not required. (ran tests with blank system prompt)

System Prompt:

If you want the model to code in specific ways, in specific languages I suggest to create a system prompt with these instructions.

This will cut down prompt size and focus the model.

Activated Experts:

Model default is set to 2 experts activated. It will run with one expert activated.

Generation:

Due to model config, suggest min 2 generations if both experts are activated (default) or 2-4 gens if one expert activated.

This will give you a large selection of varied code to choose from.

I also suggest changing rep pen from 1.1 to lower setting(s) and getting at least 2 generations at this level(s).

These generation suggestions can create stronger, more compact code - and in some cases faster code too.

For more information / other Qwen/Mistral Coders / additional settings see:

[ https://huggingface.co/DavidAU/Qwen2.5-MOE-2x-4x-6x-8x__7B__Power-CODER__19B-30B-42B-53B-gguf ]

Help, Adjustments, Samplers, Parameters and More

CHANGE THE NUMBER OF ACTIVE EXPERTS:

See this document:

https://huggingface.co/DavidAU/How-To-Set-and-Manage-MOE-Mix-of-Experts-Model-Activation-of-Experts

Settings: CHAT / ROLEPLAY and/or SMOOTHER operation of this model:

In "KoboldCpp" or "oobabooga/text-generation-webui" or "Silly Tavern" ;

Set the "Smoothing_factor" to 1.5

: in KoboldCpp -> Settings->Samplers->Advanced-> "Smooth_F"

: in text-generation-webui -> parameters -> lower right.

: In Silly Tavern this is called: "Smoothing"

NOTE: For "text-generation-webui"

-> if using GGUFs you need to use "llama_HF" (which involves downloading some config files from the SOURCE version of this model)

Source versions (and config files) of my models are here:

https://huggingface.co/collections/DavidAU/d-au-source-files-for-gguf-exl2-awq-gptq-hqq-etc-etc-66b55cb8ba25f914cbf210be

OTHER OPTIONS:

Increase rep pen to 1.1 to 1.15 (you don't need to do this if you use "smoothing_factor")
If the interface/program you are using to run AI MODELS supports "Quadratic Sampling" ("smoothing") just make the adjustment as noted.

Highest Quality Settings / Optimal Operation Guide / Parameters and Samplers

This a "Class 1" model:

For all settings used for this model (including specifics for its "class"), including example generation(s) and for advanced settings guide (which many times addresses any model issue(s)), including methods to improve model performance for all use case(s) as well as chat, roleplay and other use case(s) please see:

[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]

You can see all parameters used for generation, in addition to advanced parameters and samplers to get the most out of this model here:

[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]

DavidAU
/

Mistral-2x22B-MOE-Power-Codestral-Ultimate-39B

Mistral-2x22B-MOE-Power-Codestral-Ultimate-39B

Mistral-2x22B-MOE-Power-Codestral-Ultimate-39B

Help, Adjustments, Samplers, Parameters and More

Model tree for DavidAU/Mistral-2x22B-MOE-Power-Codestral-Ultimate-39B

Collections including DavidAU/Mistral-2x22B-MOE-Power-Codestral-Ultimate-39B

Source files for GGUF, EXL2, AWQ, GPTQ, HQQ etc etc

MOE/Mixture of Experts Models (see also "source" cll)

Coder and Programming Models - MOE, Reg, Imatrix.