TinyLlama-1.1B-Chat-v1.0 (ONNX): Local LLM Model Repository

This repository contains quantized ONNX exports of TinyLlama/TinyLlama-1.1B-Chat-v1.0, optimized for efficient local inference on resource-constrained devices such as Raspberry Pi and other ARM-based single-board computers.

🟦 ONNX Model

Included files:

model.onnx
model_quantized.onnx
model.onnx.data (if sharded)
Configuration files (config.json, tokenizer.json, etc.)

Recommended for: ONNX Runtime, Kleidi AI, and other compatible frameworks.

Quick Start

import onnxruntime as ort

session = ort.InferenceSession("model.onnx")
# ... inference code here ...

The ONNX export enables efficient inference on CPUs, NPUs, and other accelerators, making it ideal for local or edge deployments.

📋 Credits

Base model: TinyLlama
ONNX export: Optimum, ONNX Runtime
Model optimization: ARM-optimized for Raspberry Pi

Maintainer: Makatia