Mistral-7B - Q4_K_M GGUF
This is a quantized GGUF version of mistralai/Mistral-7B-Instruct-v0.3
, quantized to Q4_K_M using llama.cpp's quantization tool.
Model Info
- Base model: Mistral-7B-Instruct-v0.3
- Quantization: Q4_K_M
- Format: GGUF (compatible with llama.cpp and llama-cpp-python)
- Size: ~4.2 GB (approximate, may vary)
- Use case: General-purpose language generation, chatbots, assistants, and instruction-following tasks optimized for CPU inference
π¦ Files
Mistral-7B-Instruct-v0.3-Q4_K_M.gguf
: The quantized model
How to Use (with llama-cpp-python
)
from llama_cpp import Llama
llm = Llama(model_path="Mistral-7B-Q4_K_M.gguf")
output = llm("Explain quantum computing in simple terms.", max_tokens=200)
print(output)
Recommended Settings
- Context size: 4096 tokens (depending on your llama.cpp version)
- Hardware: Optimized for CPU (AVX2 or better recommended); also runs efficiently on GPU with llama.cpp compiled with CUDA or Metal support
Credits
- Original model: mistralai/Mistral-7B-Instruct-v0.3
- Quantized with: llama.cpp
- Downloads last month
- 16
Hardware compatibility
Log In
to view the estimation
4-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for Arivukkarasu/Mistral-7B-Instruct-v0.3-GGUF
Base model
mistralai/Mistral-7B-v0.3
Finetuned
mistralai/Mistral-7B-Instruct-v0.3