Mistral-7B - Q4_K_M GGUF

This is a quantized GGUF version of mistralai/Mistral-7B-Instruct-v0.3, quantized to Q4_K_M using llama.cpp's quantization tool.

Model Info

Base model: Mistral-7B-Instruct-v0.3
Quantization: Q4_K_M
Format: GGUF (compatible with llama.cpp and llama-cpp-python)
Size: ~4.2 GB (approximate, may vary)
Use case: General-purpose language generation, chatbots, assistants, and instruction-following tasks optimized for CPU inference

📦 Files

Mistral-7B-Instruct-v0.3-Q4_K_M.gguf: The quantized model

How to Use (with `llama-cpp-python`)

from llama_cpp import Llama

llm = Llama(model_path="Mistral-7B-Q4_K_M.gguf")
output = llm("Explain quantum computing in simple terms.", max_tokens=200)
print(output)

Recommended Settings

Context size: 4096 tokens (depending on your llama.cpp version)
Hardware: Optimized for CPU (AVX2 or better recommended); also runs efficiently on GPU with llama.cpp compiled with CUDA or Metal support

Credits

Original model: mistralai/Mistral-7B-Instruct-v0.3
Quantized with: llama.cpp

Arivukkarasu
/

Mistral-7B-Instruct-v0.3-GGUF

Mistral-7B - Q4_K_M GGUF

Model Info

📦 Files

How to Use (with `llama-cpp-python`)

Recommended Settings

Credits

Model tree for Arivukkarasu/Mistral-7B-Instruct-v0.3-GGUF

Mistral-7B - Q4_K_M GGUF

Model Info

📦 Files

How to Use (with llama-cpp-python)

Recommended Settings

Credits

Model tree for Arivukkarasu/Mistral-7B-Instruct-v0.3-GGUF

How to Use (with `llama-cpp-python`)