From Ether to Syntax: A Meta-Analytic Exploration of Linguistic Algorithmic Landscapes
continued....
Here a compleate list of the newly added architectures.
The non-mm-archs are picked up automatically when llama is updated (rather, nothing checks for these archs, other than the script that shows me daily models).
Nice. Will do in caser you forgot any vision/audio architecture.
In case yopu need it, the list/regexc is currently in /llmjob/share/llmjob.pm - search for is_vision
Also, vision is mradermacher code for multi-modal from now on.
Bert based architectures seem to be incredible
I might exclude them from the daily list for that reason, and them being likely not popular with the people who consume ggufs. (and most fail because small models tend to have custom tokenizers).
Nice I just discover an easy way to requeue previously failed archidectures:
Yup, shell-greppable logs for the win.
Update: oh, it's not even the real log file, "just" the llmc why transform of it.
@RichardErkhov vision models should not be queued to rich1 unless they arte not being detected as such (and then no vision extraction should happen).
The non-vision jobs are limited to 32GB ram, too. No clue what happened. Very troubling.
However, this morning, only besteffort models were queued on rich1. Who knows what nico queued...
well, good to know. usually you take like 4-8gb, but something went wrong today. Peak recorded by proxmox was 24gb (so I assume it was even higher, but due to total OOM, it might not have recorded full number. I added swap on root just in case this happens again so at least other things on server dont die haha
llmc audit besteffort
skips the besteffort models for me.
Please restart Audio-Reasoner
imatrix computation. I killed it earlier today because it ran on CPU. I'm still not sure what makes GPUs occasionally temporary disappear but seams related to them being used on a different container.
llmc audit besteffort skips the besteffort models for me.
Right, arguments were not passed to llmjob audit. Should be fixed now.
Peak recorded by proxmox was 24gb
Well, given that I was officially allowed to use 64GB, 24GB seems absolutely normal. So what is the new limit? 24GB will only allow one quant, and maybe not even that.
just a heads-up, i have a rather inconvenient case of food infection and won't be very active till i am more healthy again.
I hope you soon feel better again soon.
@mradermacher Please update to the latest version of our llama.cpp fork once you feel well enough to do so. Kimi-K2 support just got merged! I'm so excited to try it out. The latest update also adds support for Plamo2ForCausalLM.
@mradermacher
Once you have updated llama.cpp please start Kimi-K2-Instruct
. I have already updated the source GGUF.
Feeling a bit better, trying to do some simple things. Sheesh, that were two horrible days.
llama is updated, but this message is new:
WARNING: Ignoring invalid distribution ~f-xet (/llmjob/share/python/lib/python3.11/site-packages)
I've restarted kimi, but I don't know if the change invalidated the gguf or not.
Thanks a lot for updating to latest llama.cpp! Kimi-K2-Instruct
is now running successfully. I'm so looking forward to this model.
If you have time, please configure Kimi-K2-Instruct
to use imatrix RPC. There is obviously no way F16 or even Q8_0 will fit. Q6_K might still be too big but Q5_K_M should work. Because we don’t know yet just make the imatrix task use the F16 naming and I will link whatever quant fits.
Edit: Q6_K sesms to fit so we are going to use it for imatrix RPC so feel free to specify this quant when configuring the Kimi K2 RPC imatrix task. I already provided /tmp/Kimi-K2-Instruct.Q6_K.gguf
Feeling a bit better, trying to do some simple things. Sheesh, that were two horrible days.
Glad you feel better again.
I've restarted kimi, but I don't know if the change invalidated the gguf or not.
It did which is why I overnight regenerated the Kimi-K2-Instruct
SOURCE GGUF using my own already updated llama.cpp build. I even had to update some files in the downloaded model and BF16 conversion first as the actual model contained issues and had to be updated as well. Even now Kimi-K2-Instruct to SOURCE GGUF conversion still requires tiktoken and arbitrary code execution which beside its enormous size is why SOURCE GGUFs for this models need to be provided manually.
WARNING: Ignoring invalid distribution ~f-xet (/llmjob/share/python/lib/python3.11/site-packages)
Maybe time to give XET another try in the near future once XET v1.1.6 is out. Currently they are at v1.1.6-rc2. They are currently implementing XET in web assembly so even downloads using the HuggingFace website will likely soon use XET.
Please update llama.cpp to the latest version of ouer fork for https://huggingface.co/mradermacher/model_requests/discussions/1167 and so ouer entire RPC setup has the same version for /tmp/Kimi-K2-Instruct.Q6_K.gguf
imatrix RPC.
@mradermacher Please update to the latest llama.cpp version of ouer fork then remove the override from the ERNIE tasks on nico1 and configure the ERNIE 300B tasks to use RPC imatrix at F16.