Text Generation
Transformers
Safetensors
PyTorch
nvidia
conversational

GGUF support

#4
by RedEyed - opened

Hello, model looks very promising!
I want to try it locally via llama.cpp/ollama, will the model be available in GGUF format?

Thank you.

Always the same bulls*** .... nerds get top priority, but the average person who uses GGUF comes second... sigh

I pushed a safetensors fp8 you can run on 3090 for now.

Working on llamacpp today. Which is required to even get a gguf. Nemotron-h is a new hybrid architecture.

It’s not some trivial thing. It’s a 57 layer hybrid state space model interwoven with transformer MLP layers.

Sign up or log in to comment