This model was converted and quantized using the official tools provided by llama.cpp.

Serving:

./build/bin/llama-cli -m /models/nvidia-llama-3_1-nemotron-nano-8b-v1-q4_k_m.gguf -ngl 999

./build/bin/llama-server -m /models/nvidia-llama-3_1-nemotron-nano-8b-v1-q4_k_m.gguf -ngl 999

Downloads last month: 33

GGUF

Model size

8.03B params

Architecture

llama

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yeahdongcn/nvidia-llama-3_1-nemotron-nano-8b-v1-q4_k_m-gguf

Base model

nvidia/Llama-3.1-Nemotron-Nano-8B-v1

Quantized

(24)

this model