Sao10K
/

Fimbulvetr-11B-v2.1-16K

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Sao10K commited on Jun 29, 2024

Commit

dc16042

·

verified ·

1 Parent(s): 586c735

Update README.md

Files changed (1) hide show

README.md +5 -1

README.md CHANGED Viewed

@@ -13,11 +13,15 @@ Fimbulvetr-v2 but extended to 16K with PoSE. A sane context value would be ~12K
 Note:
 <br> \- I left Rope Theta at 10K for this train, instead of expanding it like with Stheno 3.3. Solar did not play will with extended theta, grad norm / loss values went parabolic or plunged from 10000+ down. Unreliable pretty much, unlike Stheno 3.3's training run.
 Notes:
-<br> \- I noticed peoplle having bad issues with quants. Be it GGUF or others, at 8 bit or less. Kind of a weird issue? I had little to no issues during testing at the full precision
 <br> \- Slightly different results from base Fimbulvetr-v2, but during my tests they are similar enough. The vibes are still there.
 <br> \- Formatting issues happen rarely. Sometimes. A reroll / regenerate fixes it from tests.
 <br> \- I get consistent and reliable answers at ~11K context fine.
 <br> \- Still coherent at up to 16K though! Just works not that well.
 ![Needle](https://huggingface.co/Sao10K/Fimbulvetr-11B-v2.1-16K/resolve/main/output.png)

 Note:
 <br> \- I left Rope Theta at 10K for this train, instead of expanding it like with Stheno 3.3. Solar did not play will with extended theta, grad norm / loss values went parabolic or plunged from 10000+ down. Unreliable pretty much, unlike Stheno 3.3's training run.
+---
 Notes:
+<br> \- I noticed people having bad issues with quants. Be it GGUF or others, at 8 bit or less. Kind of a weird issue? I had little to no issues during testing at the full precision
 <br> \- Slightly different results from base Fimbulvetr-v2, but during my tests they are similar enough. The vibes are still there.
 <br> \- Formatting issues happen rarely. Sometimes. A reroll / regenerate fixes it from tests.
 <br> \- I get consistent and reliable answers at ~11K context fine.
 <br> \- Still coherent at up to 16K though! Just works not that well.
+I recommend sticking up to 12K context, but loading the model at 16K. It has a really accurate context up to 10K from extended long context testing. 16K works fine for roleplays, but not for more detailed tasks.
 ![Needle](https://huggingface.co/Sao10K/Fimbulvetr-11B-v2.1-16K/resolve/main/output.png)