Arivukkarasu
/

TinyLlama-1.1B-Chat-GGUF

Model card Files Files and versions Community

TinyLlama-1.1B-Chat - Q4_K_M GGUF

This is a quantized GGUF version of TinyLlama/TinyLlama-1.1B-Chat, quantized to Q4_K_M using llama.cpp's quantization tool.

Model Info

Base model: TinyLlama-1.1B-Chat
Quantization: Q4_K_M
Format: GGUF (compatible with llama.cpp and llama-cpp-python)
Size: ~500 MB
Use case: Efficient CPU inference for chat and assistant use-cases

📦 Files

TinyLlama-1.1B-Chat-Q4_K_M.gguf: The quantized model

How to Use (with `llama-cpp-python`)

from llama_cpp import Llama

llm = Llama(model_path="TinyLlama-1.1B-Chat-Q4_K_M.gguf")
output = llm("Who are you?", max_tokens=128)
print(output)

Credits

Original model: TinyLlama/TinyLlama-1.1B-Chat
Quantized with: llama.cpp

Downloads last month: 24

GGUF

Model size

1.1B params

Architecture

llama

Hardware compatibility

Log In to view the estimation

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Arivukkarasu/TinyLlama-1.1B-Chat-GGUF

Base model

TinyLlama/TinyLlama-1.1B-Chat-v1.0

Quantized

(105)

this model