llama.cpp now supports Llama-3_1-Nemotron-Ultra-253B-v1
#11
by
ymcki
- opened
llama.cpp now supports Nvidia's Llama-3_1-Nemotron-Ultra-253B-v1 starting from b5270.
Hopefully, this can help Nvidia sell some DGX Sparks. ;)
https://github.com/ggml-org/llama.cpp/pull/12843
As you can see, people having multiple cards are having trouble running it due to the tail heavy nature of this model.
While I made an Excel that can help them to get started to set the "-ts" parameter of llama-cli:
Ideally it will be great if llama.cpp can distribute the layers automatically.
If Nvidia can sponsor me to access a machine with multiple cards, I can help developing such a feature.