---
license: mit
library_name: llama.cpp
tags:
  - mistral
  - gguf
  - onnx
  - quantized
  - edge-llm
  - raspberry-pi
  - local-inference
model_creator: mistralai
language: en
---

# Mistral-7B-Instruct-v0.2: Local LLM Model Repository

...

This repository provides quantized GGUF and ONNX exports of [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2), optimized for efficient local inference—especially on resource-constrained devices like Raspberry Pi.

---

## 🦙 GGUF Model (Q8_0)

**Filename:** `mistral-7b-instruct-v0.2.Q8_0-Q8_0.gguf`  
**Format:** GGUF (Q8_0)  
**Best for:** `llama.cpp`, `koboldcpp`, LM Studio, and similar tools.

### Quick Start

```bash
./main -m mistral-7b-instruct-v0.2.Q8_0-Q8_0.gguf -p "Hello, world!"
```

This quantized GGUF model is designed for fast, memory-efficient inference on local hardware, including Raspberry Pi and other edge devices.

---

## 🟦 ONNX Model

**Filename:** `mistral-7b-instruct-v0.2.onnx`  
**Format:** ONNX  
**Best for:** ONNX Runtime, Kleidi AI, and compatible frameworks.

### Quick Start

```python
import onnxruntime as ort

session = ort.InferenceSession("mistral-7b-instruct-v0.2.onnx")
# ... inference code here ...
```

The ONNX export enables efficient inference on CPUs, GPUs, and accelerators—ideal for local deployment.

---

## 📋 Credits

- **Base model:** [Mistral AI](https://mistral.ai/)
- **Quantization:** [llama.cpp](https://github.com/ggerganov/llama.cpp)
- **ONNX export:** [Optimum](https://github.com/huggingface/optimum), [ONNX Runtime](https://github.com/microsoft/onnxruntime)

---

**Maintainer:** [Makatia](https://huggingface.co/Makatia)