Triangle104/L3.1-Dark-Reasoning-Celeste-V1.2-Hermes-R1-Uncensored-8B-Q4_K_M-GGUF

This model was converted to GGUF format from DavidAU/L3.1-Dark-Reasoning-Celeste-V1.2-Hermes-R1-Uncensored-8B using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model.

Context : 128k.

Required: Llama 3 Instruct template.

"Dark Reasoning" is a variable control reasoning model that is uncensored and operates at all temps/settings andis for creative uses cases and general usage.

This version's "thinking"/"reasoning" has been "darkened" by the original CORE model's DNA (see model tree) and will also be shorter and more compressed. Additional system prompts below to take this a lot further - a lot darker, a lot more ... evil.

Higher temps will result in deeper, richer "thoughts"... and frankly more interesting ones too.

The "thinking/reasoning" tech (for the model at this repo) is from the original Llama 3.1 "DeepHermes" model from NousResearch:

[ https://huggingface.co/NousResearch/DeepHermes-3-Llama-3-8B-Preview ]

This version will retain all the functions and features of the original "DeepHermes" model at about 50%-67% of original reasoning power.

Please visit their repo for all information on features, test results and so on.

KNOWN ISSUES:

-You may need to hit regen sometimes to get the thinking/reasoning to activate / get a good "thinking block".

-Sometimes the 2nd or 3rd generation is the best version. Suggest min of 5 for specific creative uses.

-Sometimes the thinking block will end, and you need to manually prompt the model to "generate" the output.

USE CASES:

This model is for all use cases, and but designed for creative use cases specifically.

This model can also be used for solving logic puzzles, riddles, and other problems with the enhanced "thinking" systems.

This model also can solve problems/riddles/ and puzzles normally beyond the abilities of a Llama 3.1 model due to DeepHermes systems.

(It will not however, have the same level of abilities due to Dark Planet core.)

This model WILL produce HORROR / NSFW / uncensored content in EXPLICIT and GRAPHIC DETAIL.

TEMP/SETTINGS:

Set Temp between 0 and .8, higher than this "think" functions will activate differently. The most "stable" temp seems to be .6, with a variance of +-0.05. Lower for more "logic" reasoning, raise it for more "creative" reasoning (max .8 or so). Also set context to at least 4096, to account for "thoughts" generation.

For temps 1+,2+ etc etc, thought(s) will expand, and become deeper and richer.

Set "repeat penalty" to 1.02 to 1.07 (recommended) .

This model requires a Llama 3 Instruct and/or Command-R chat template. (see notes on "System Prompt" / "Role" below) OR standard "Jinja Autoloaded Template" (this is contained in the quant and will autoload)

PROMPTS:

If you enter a prompt without implied "step by step" requirements (ie: Generate a scene, write a story, give me 6 plots for xyz), "thinking" (one or more) MAY activate AFTER first generation. (IE: Generate a scene -> scene will generate, followed by suggestions for improvement in "thoughts")

If you enter a prompt where "thinking" is stated or implied (ie puzzle, riddle, solve this, brainstorm this idea etc), "thoughts" process(es) in Deepseek will activate almost immediately. Sometimes you need to regen it to activate.

You will also get a lot of variations - some will continue the generation, others will talk about how to improve it, and some (ie generation of a scene) will cause the characters to "reason" about this situation. In some cases, the model will ask you to continue generation /thoughts too.

In some cases the model's "thoughts" may appear in the generation itself.

State the word size length max IN THE PROMPT for best results, especially for activation of "thinking." (see examples below)

You may want to try your prompt once at "default" or "safe" temp settings, another at temp 1.2, and a third at 2.5 as an example. This will give you a broad range of "reasoning/thoughts/problem" solving.

GENERATION - THOUGHTS/REASONING:

It may take one or more regens for "thinking" to "activate." (depending on the prompt)

Model can generate a LOT of "thoughts". Sometimes the most interesting ones are 3,4,5 or more levels deep.

Many times the "thoughts" are unique and very different from one another.

Temp/rep pen settings can affect reasoning/thoughts too.

Change up or add directives/instructions or increase the detail level(s) in your prompt to improve reasoning/thinking.

Adding to your prompt: "think outside the box", "brainstorm X number of ideas", "focus on the most uncommon approaches" can drastically improve your results.

GENERAL SUGGESTIONS:

I have found opening a "new chat" per prompt works best with "thinking/reasoning activation", with temp .6, rep pen 1.05 ... THEN "regen" as required.

Sometimes the model will really really get completely unhinged and you need to manually stop it. Depending on your AI app, "thoughts" may appear with "< THINK >" and "</ THINK >" tags AND/OR the AI will generate "thoughts"directly in the main output or later output(s).

Although quant q4KM was used for testing/examples, higher quants will provide better generation / more sound "reasoning/thinking".

Use with llama.cpp

Install llama.cpp through brew (works on Mac and Linux)

brew install llama.cpp

Invoke the llama.cpp server or the CLI.

CLI:

llama-cli --hf-repo Triangle104/L3.1-Dark-Reasoning-Celeste-V1.2-Hermes-R1-Uncensored-8B-Q4_K_M-GGUF --hf-file l3.1-dark-reasoning-celeste-v1.2-hermes-r1-uncensored-8b-q4_k_m.gguf -p "The meaning to life and the universe is"

Server:

llama-server --hf-repo Triangle104/L3.1-Dark-Reasoning-Celeste-V1.2-Hermes-R1-Uncensored-8B-Q4_K_M-GGUF --hf-file l3.1-dark-reasoning-celeste-v1.2-hermes-r1-uncensored-8b-q4_k_m.gguf -c 2048

Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well.

Step 1: Clone llama.cpp from GitHub.

git clone https://github.com/ggerganov/llama.cpp

Step 2: Move into the llama.cpp folder and build it with LLAMA_CURL=1 flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).

cd llama.cpp && LLAMA_CURL=1 make

Step 3: Run inference through the main binary.

./llama-cli --hf-repo Triangle104/L3.1-Dark-Reasoning-Celeste-V1.2-Hermes-R1-Uncensored-8B-Q4_K_M-GGUF --hf-file l3.1-dark-reasoning-celeste-v1.2-hermes-r1-uncensored-8b-q4_k_m.gguf -p "The meaning to life and the universe is"

./llama-server --hf-repo Triangle104/L3.1-Dark-Reasoning-Celeste-V1.2-Hermes-R1-Uncensored-8B-Q4_K_M-GGUF --hf-file l3.1-dark-reasoning-celeste-v1.2-hermes-r1-uncensored-8b-q4_k_m.gguf -c 2048

Triangle104
/

L3.1-Dark-Reasoning-Celeste-V1.2-Hermes-R1-Uncensored-8B-Q4_K_M-GGUF