From Ether to Syntax: A Meta-Analytic Exploration of Linguistic Algorithmic Landscapes

by mradermacher - opened May 31

Discussion

mradermacher

Owner May 31

continued....

mradermacher changed discussion status to closed May 31

mradermacher

Owner May 31

•

edited May 31

Here a compleate list of the newly added architectures.

The non-mm-archs are picked up automatically when llama is updated (rather, nothing checks for these archs, other than the script that shows me daily models).

Nice. Will do in caser you forgot any vision/audio architecture.

In case yopu need it, the list/regexc is currently in /llmjob/share/llmjob.pm - search for is_vision

Also, vision is mradermacher code for multi-modal from now on.

Bert based architectures seem to be incredible

I might exclude them from the daily list for that reason, and them being likely not popular with the people who consume ggufs. (and most fail because small models tend to have custom tokenizers).

Nice I just discover an easy way to requeue previously failed archidectures:

Yup, shell-greppable logs for the win.

Update: oh, it's not even the real log file, "just" the llmc why transform of it.

mradermacher

Owner May 31

@RichardErkhov vision models should not be queued to rich1 unless they arte not being detected as such (and then no vision extraction should happen).

The non-vision jobs are limited to 32GB ram, too. No clue what happened. Very troubling.

However, this morning, only besteffort models were queued on rich1. Who knows what nico queued...

RichardErkhov

May 31

well, good to know. usually you take like 4-8gb, but something went wrong today. Peak recorded by proxmox was 24gb (so I assume it was even higher, but due to total OOM, it might not have recorded full number. I added swap on root just in case this happens again so at least other things on server dont die haha

nicoboss

May 31

llmc audit besteffort skips the besteffort models for me.

nicoboss

May 31

Please restart Audio-Reasoner imatrix computation. I killed it earlier today because it ran on CPU. I'm still not sure what makes GPUs occasionally temporary disappear but seams related to them being used on a different container.

mradermacher

Owner Jun 1

llmc audit besteffort skips the besteffort models for me.

Right, arguments were not passed to llmjob audit. Should be fixed now.

mradermacher

Owner Jun 1

@RichardErkhov

Peak recorded by proxmox was 24gb

Well, given that I was officially allowed to use 64GB, 24GB seems absolutely normal. So what is the new limit? 24GB will only allow one quant, and maybe not even that.

250 hidden messages

Expand all

nicoboss

2 days ago

just a heads-up, i have a rather inconvenient case of food infection and won't be very active till i am more healthy again.

I hope you soon feel better again soon.

@mradermacher Please update to the latest version of our llama.cpp fork once you feel well enough to do so. Kimi-K2 support just got merged! I'm so excited to try it out. The latest update also adds support for Plamo2ForCausalLM.

nicoboss

2 days ago

@mradermacher Once you have updated llama.cpp please start Kimi-K2-Instruct. I have already updated the source GGUF.

mradermacher

Owner 2 days ago

Feeling a bit better, trying to do some simple things. Sheesh, that were two horrible days.

mradermacher

Owner 2 days ago

llama is updated, but this message is new:

WARNING: Ignoring invalid distribution ~f-xet (/llmjob/share/python/lib/python3.11/site-packages)

mradermacher

Owner 2 days ago

I've restarted kimi, but I don't know if the change invalidated the gguf or not.

nicoboss

2 days ago

•

edited about 24 hours ago

Thanks a lot for updating to latest llama.cpp! Kimi-K2-Instruct is now running successfully. I'm so looking forward to this model.

If you have time, please configure Kimi-K2-Instruct to use imatrix RPC. There is obviously no way F16 or even Q8_0 will fit. Q6_K might still be too big but Q5_K_M should work. Because we don’t know yet just make the imatrix task use the F16 naming and I will link whatever quant fits.

Edit: Q6_K sesms to fit so we are going to use it for imatrix RPC so feel free to specify this quant when configuring the Kimi K2 RPC imatrix task. I already provided /tmp/Kimi-K2-Instruct.Q6_K.gguf

Feeling a bit better, trying to do some simple things. Sheesh, that were two horrible days.

Glad you feel better again.

I've restarted kimi, but I don't know if the change invalidated the gguf or not.

It did which is why I overnight regenerated the Kimi-K2-Instruct SOURCE GGUF using my own already updated llama.cpp build. I even had to update some files in the downloaded model and BF16 conversion first as the actual model contained issues and had to be updated as well. Even now Kimi-K2-Instruct to SOURCE GGUF conversion still requires tiktoken and arbitrary code execution which beside its enormous size is why SOURCE GGUFs for this models need to be provided manually.

WARNING: Ignoring invalid distribution ~f-xet (/llmjob/share/python/lib/python3.11/site-packages)

Maybe time to give XET another try in the near future once XET v1.1.6 is out. Currently they are at v1.1.6-rc2. They are currently implementing XET in web assembly so even downloads using the HuggingFace website will likely soon use XET.

nicoboss

about 23 hours ago

Please update llama.cpp to the latest version of ouer fork for https://huggingface.co/mradermacher/model_requests/discussions/1167 and so ouer entire RPC setup has the same version for /tmp/Kimi-K2-Instruct.Q6_K.gguf imatrix RPC.

nicoboss

about 2 hours ago

@mradermacher Please update to the latest llama.cpp version of ouer fork then remove the override from the ERNIE tasks on nico1 and configure the ERNIE 300B tasks to use RPC imatrix at F16.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment