nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 · llama.cpp now supports Llama-3

llama.cpp now supports Nvidia's Llama-3_1-Nemotron-Ultra-253B-v1 starting from b5270.

Hopefully, this can help Nvidia sell some DGX Sparks. ;)

As you can see, people having multiple cards are having trouble running it due to the tail heavy nature of this model.

While I made an Excel that can help them to get started to set the "-ts" parameter of llama-cli:

Ideally it will be great if llama.cpp can distribute the layers automatically.

If Nvidia can sponsor me to access a machine with multiple cards, I can help developing such a feature.

nvidia
/

Llama-3_1-Nemotron-Ultra-253B-v1