This model was converted and quantized using the official tools provided by llama.cpp.

Serving:

./build/bin/llama-cli -m /models/nvidia-llama-3_1-nemotron-nano-8b-v1-q4_k_m.gguf -ngl 999

./build/bin/llama-server -m /models/nvidia-llama-3_1-nemotron-nano-8b-v1-q4_k_m.gguf -ngl 999
Downloads last month
33
GGUF
Model size
8.03B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for yeahdongcn/nvidia-llama-3_1-nemotron-nano-8b-v1-q4_k_m-gguf

Quantized
(24)
this model