DavidAU commited on
Commit
093eafa
·
verified ·
1 Parent(s): fda500d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +208 -0
README.md ADDED
@@ -0,0 +1,208 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: NousResearch/DeepHermes-3-Llama-3-3B-Preview
3
+ tags:
4
+ - Llama 3.2
5
+ - instruct
6
+ - 128k context
7
+ - all use cases
8
+ - maxed quants
9
+ - Neo Imatrix
10
+ - instruct
11
+ - finetune
12
+ - chatml
13
+ - gpt4
14
+ - synthetic data
15
+ - distillation
16
+ - function calling
17
+ - json mode
18
+ - axolotl
19
+ - roleplaying
20
+ - chat
21
+ - reasoning
22
+ - r1
23
+ - vllm
24
+ - reasoning
25
+ - thinking
26
+ - r1
27
+ - cot
28
+ - deepseek
29
+ - Qwen2.5
30
+ - Hermes
31
+ - DeepHermes
32
+ - DeepSeek
33
+ - DeepSeek-R1-Distill
34
+ - Uncensored
35
+ - creative
36
+ - 128k context
37
+ - general usage
38
+ - problem solving
39
+ - brainstorming
40
+ - solve riddles
41
+ - general usage
42
+ - problem solving
43
+ - brainstorming
44
+ - solve riddles
45
+ - fiction writing
46
+ - plot generation
47
+ - sub-plot generation
48
+ - fiction writing
49
+ - story generation
50
+ - scene continue
51
+ - storytelling
52
+ - fiction story
53
+ - story
54
+ - writing
55
+ - fiction
56
+ - roleplaying
57
+ - swearing
58
+ - horror
59
+ license: apache-2.0
60
+ pipeline_tag: text-generation
61
+ ---
62
+
63
+ (quants uploading, examples to follow.)
64
+
65
+ <h2>Llama3.2-DeepHermes-3-3B-Preview-Reasoning-MAX-HORROR-Imatrix-GGUF</h2>
66
+
67
+ <img src="deep-hermes-3b.jpg" style="float:right; width:300px; height:300px; padding:5px;">
68
+
69
+ NousResearch's newest Llama 3.2 Reasoning/Thinking model with "Neo Imatrix" and "Maxed out" quantization to improve overall performance.
70
+
71
+ The "Horror Imatrix" was built using Grand Horror 16B (at my repo). This adds a "tint" of horror to the model.
72
+
73
+ Combined with Llama 3.2's superior instruction folllowing and output generation this makes a reasoning/thinking model in a tiny
74
+ package that far outperforms and closes in on 8B+ reasoning model size performance.
75
+
76
+ 5 examples provided below with prompts at IQ4XS (80 t/s on mid level card) ; Q8 at 55 t/s.
77
+
78
+ Context: 128k.
79
+
80
+ "MAXED"
81
+
82
+ This means the embed and output tensor are set at "BF16" (full precision) for all quants.
83
+ This enhances quality, depth and general performance at the cost of a slightly larger quant.
84
+
85
+ "NEO IMATRIX"
86
+
87
+ A strong, in house built, imatrix dataset built by David_AU which results in better overall function,
88
+ instruction following, output quality and stronger connections to ideas, concepts and the world in general.
89
+
90
+ This combines with "MAXing" the quant to improve preformance.
91
+
92
+ This chart shows the order in terms of "BPW" for each quant (mapped below with relative "strength" to one another) with "IQ1_S" with the least, and "Q8_0" (F16 is full precision) with the most:
93
+
94
+ <small>
95
+ <PRE>
96
+ IQ1_S | IQ1_M
97
+ IQ2_XXS | IQ2_XS | Q2_K_S | IQ2_S | Q2_K | IQ2_M
98
+ IQ3_XXS | Q3_K_S | IQ3_XS | IQ3_S | IQ3_M | Q3_K_M | Q3_K_L
99
+ Q4_K_S | IQ4_XS | IQ4_NL | Q4_K_M
100
+ Q5_K_S | Q5_K_M
101
+ Q6_K
102
+ Q8_0
103
+ F16
104
+ </pre>
105
+ </small>
106
+
107
+ IMPORTANT:
108
+
109
+ Reasoning / thinking skills are DIRECTLY related to quant size. However, there will be drastic difference in Token/Second
110
+ between the lowest quant and highest quant, so finding the right balance is key.
111
+
112
+ Suggest also: minimum 8k context window, especially for IQ4/Q4 or lower quants.
113
+
114
+ Also, in some cases, the IQ quants work slightly better than they closest "Q" quants.
115
+
116
+ Recommend quants IQ3s / IQ4XS / IQ4NL / Q4s for best results for creative uses cases.
117
+
118
+ IQ4XS/IQ4NL quants will produce different output from other "Q" and "IQ" quants.
119
+
120
+ Recommend q5s/q6/q8 for general usage.
121
+
122
+ Quants Q4_0/Q5_0 for portable, phone and other devices.
123
+
124
+ Q8 is a maxed quant only, as imatrix has no effect on this quant.
125
+
126
+ Use this quant or F16 (full precision) for MAXIMUM reasoning/thinking performance.
127
+
128
+ Note that IQ1s performance is low, whereas IQ2s are passable.
129
+
130
+ More information on quants is in the document below "Highest Quality Settings / Optimal Operation Guide / Parameters and Samplers".
131
+
132
+ <b>Benchmarks / More Information:</b>
133
+
134
+ For benchmarks and other information about this model, see the original source repo here:
135
+
136
+ [ https://huggingface.co/NousResearch/DeepHermes-3-Llama-3-3B-Preview ]
137
+
138
+ <B>System Prompt</B>
139
+
140
+ Use this system prompt to turn on/off reasoning in the model:
141
+
142
+ ```
143
+ You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem.
144
+ ```
145
+
146
+ <b>Optional : System Prompt</b>
147
+
148
+ This is an optional system prompt you can use to enhance operation.
149
+
150
+ Copy and paste exactly as shown, including line breaks.
151
+
152
+ You may want to adjust the "20" (both) to increase/decrease the power of this prompt.
153
+
154
+ You may also want to delete the line:
155
+
156
+ 'At the end of the task you will ask the user: "Do you want another generation?"'
157
+
158
+ <pre>
159
+ For every user task and instruction you will use "GE FUNCTION" to ponder the TASK STEP BY STEP and then do the task. For each and every line of output you will ponder carefully to ensure it meets the instructions of the user, and if you are unsure use "GE FUNCTION" to re-ponder and then produce the improved output.
160
+
161
+ At the end of the task you will ask the user: "Do you want another generation?"
162
+
163
+ GE FUNCTION: Silent input → Spawn 20 agents Sternberg Styles → Enhance idea → Seek Novel Emergence NE:unique/significant idea/concept → Ponder, assess, creative enhance notions → Refined idea => IdeaArray[].size=20 elements, else → Interesting? Pass to rand. agent for refinement, else discard.=>output(IdeaArray)
164
+ </pre>
165
+
166
+ <B>IMPORTANT: Highest Quality Settings / Optimal Operation Guide / Parameters and Samplers</B>
167
+
168
+ If you are going to use this model, (source, GGUF or a different quant), please review this document for critical parameter, sampler and advance sampler settings (for multiple AI/LLM aps).
169
+
170
+ This will also link to a "How to" section on "Reasoning Models" tips and tricks too.
171
+
172
+ This a "Class 1" (settings will enhance operation) model:
173
+
174
+ For all settings used for this model (including specifics for its "class"), including example generation(s) and for advanced settings guide (which many times addresses any model issue(s)), including methods to improve model performance for all use case(s) as well as chat, roleplay and other use case(s) (especially for use case(s) beyond the model's design) please see:
175
+
176
+ [ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]
177
+
178
+ REASON:
179
+
180
+ Regardless of "model class" this document will detail methods to enhance operations.
181
+
182
+ If the model is a Class 3/4 model the default settings (parameters, samplers, advanced samplers) must be set for "use case(s)" uses correctly. Some AI/LLM apps DO NOT have consistant default setting(s) which result in sub-par model operation. Like wise for Class 3/4 models (which operate somewhat to very differently than standard models) additional samplers and advanced samplers settings are required to "smooth out" operation, AND/OR also allow full operation for use cases the model was not designed for.
183
+
184
+ BONUS - Use these settings for ANY model, ANY repo, ANY quant (including source/full precision):
185
+
186
+ This document also details parameters, sampler and advanced samplers that can be use FOR ANY MODEL, FROM ANY REPO too - all quants, and of course source code operation too - to enhance the operation of any model.
187
+
188
+ [ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]
189
+
190
+ ---
191
+
192
+ <h3>EXAMPLES:</h3>
193
+
194
+ Examples are created using quant IQ4XS AND Q8_0, minimal parameters and Standard template.
195
+
196
+ Temp range .8, Rep pen 1.1 , TopK 40 , topP .95, minP .05
197
+
198
+ Rep pen range: 64-128 (helps keep reasoning on track / quality of output)
199
+
200
+ Below are the least creative outputs, prompt is in <B>BOLD</B>.
201
+
202
+ ---
203
+
204
+ <B><font color="red">WARNING:</font> MAYBE: NSFW. Graphic HORROR. Swearing. UNCENSORED. </B>
205
+
206
+ NOTE: Some formatting was lost from copy/paste HTML.
207
+
208
+ ---