TinyLlama-1.1B-Chat - Q4_K_M GGUF
This is a quantized GGUF version of TinyLlama/TinyLlama-1.1B-Chat
, quantized to Q4_K_M using llama.cpp's quantization tool.
Model Info
- Base model: TinyLlama-1.1B-Chat
- Quantization: Q4_K_M
- Format: GGUF (compatible with llama.cpp and llama-cpp-python)
- Size: ~500 MB
- Use case: Efficient CPU inference for chat and assistant use-cases
π¦ Files
TinyLlama-1.1B-Chat-Q4_K_M.gguf
: The quantized model
How to Use (with llama-cpp-python
)
from llama_cpp import Llama
llm = Llama(model_path="TinyLlama-1.1B-Chat-Q4_K_M.gguf")
output = llm("Who are you?", max_tokens=128)
print(output)
Credits
- Original model: TinyLlama/TinyLlama-1.1B-Chat
- Quantized with: llama.cpp
- Downloads last month
- 24
Hardware compatibility
Log In
to view the estimation
4-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for Arivukkarasu/TinyLlama-1.1B-Chat-GGUF
Base model
TinyLlama/TinyLlama-1.1B-Chat-v1.0