DavidAU
/

Llama-3.1-1-million-ctx-DeepHermes-Deep-Reasoning-8B-GGUF

Model card Files Files and versions Community

General overview.

by VizorZ0042 - opened Apr 18

Apr 18

Dear DavidAU.

Have been testing this model for 7+ hours and astonished by its capabilities, it doesn't break characters, keep good writing style and overall is greatly improved. It's astonishing how 8B model can be so good. Thank you for your amazing work, truly appreciate every bit you do for everyone.

DavidAU

Owner Apr 19

Thank you again.
Actually was surprised myself during testing how well this model performs.
Going to test 2m/4m versions next.

Seems some of the 1m "training" transfers to the "core" model - still a lot of questions/things to try.

VizorZ0042

Apr 19

•

edited Apr 19

Yeah, 1M dramatically enhances your models. According to the comparison list, they don't differ much, but I'm curious to see and test them further

VizorZ0042

Apr 19

Further testing also reveals an extra EOS Token ://

DavidAU

Owner Apr 20

Excellent ; do you remember the Extra EOS token that it... popped out?
Seems there was an update to the tokenizer for 1 million model after this model was created ; might address this.
Hmmm.

VizorZ0042

about 1 month ago

@DavidAU It's related to Deep Hermes as Dark-Reasoning-Dark-Planet-Hermes-R1-Uncensored outputs the same. Sometimes it just outputs "://" (Without quotes.) before triggering an actual EOS token (For my occasion it's >), or something like:
You are a smart, helpful assistant...
And etc. before triggering an actual EOS token.

VizorZ0042

16 days ago

•

edited 16 days ago

@DavidAU After 2 weeks+ testing I noticed how extremely stable this model is (up to 2.3 temp); It doesn't mess up memory and names like stheno merges do, definitely one of the most stable 8B models so far. Temp 2.4 and 2.5 may give good results sometimes, and temp 2.5+ starts to output more and more noticeable confusions.

Lower temps (1.35-1.6 / 1.75) gives a good balance and greater stability for longer instructions and character cards.

Overall this model has great performance, outstanding stability and coherence, while providing good creativity.

DavidAU

Owner 15 days ago

@VizorZ0042

Excellent,. thank you for the detailed notes.

Just uploading source for multiple context levels of Qwen 3 - 8Bs.
("reg" 32k context is already up/quanted => NEO/HORROR versions)

Found that setting / changing to "core max context" length (via YARN) impacts generation - especially long form / creative.
Uploading 64k, 96k, 128k, 192k, 256k and 320k versions.

Likewise HORROR / NEO imatrix when applied to each (and generated at different max context lengths) also affects gen / operation / reasoning.
For creative -> This can have an extreme impact.

VizorZ0042

15 days ago

@DavidAU Absolutely great! But unfortunately I can only test max 12K context for Q4KS / Q4KM or 10K for Q5KS / Q5KM with my machine.

If it's okay to test your Qwen 3 8Bs variants with 10K/12K context, I'll be glad to do it.

DavidAU

Owner 14 days ago

@VizorZ0042
IQ3_M (imatrix) work very well (as do all IQ3s) ; likewise IQ4XS/NL.
More context and boom for your VRAM.
I tested IQ3_M (imat) with the 320k context version.

NOTE:
Without Imatrix, min size is IQ4XS / Q4 or better.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment