TinyLlama-1.1B-Chat - Q4_K_M GGUF

This is a quantized GGUF version of TinyLlama/TinyLlama-1.1B-Chat, quantized to Q4_K_M using llama.cpp's quantization tool.

Model Info

  • Base model: TinyLlama-1.1B-Chat
  • Quantization: Q4_K_M
  • Format: GGUF (compatible with llama.cpp and llama-cpp-python)
  • Size: ~500 MB
  • Use case: Efficient CPU inference for chat and assistant use-cases

πŸ“¦ Files

  • TinyLlama-1.1B-Chat-Q4_K_M.gguf: The quantized model

How to Use (with llama-cpp-python)

from llama_cpp import Llama

llm = Llama(model_path="TinyLlama-1.1B-Chat-Q4_K_M.gguf")
output = llm("Who are you?", max_tokens=128)
print(output)

Credits

Downloads last month
24
GGUF
Model size
1.1B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Arivukkarasu/TinyLlama-1.1B-Chat-GGUF

Quantized
(105)
this model