TinyLlama-1.1B-Chat-v1.0 (ONNX): Local LLM Model Repository
This repository contains quantized ONNX exports of TinyLlama/TinyLlama-1.1B-Chat-v1.0, optimized for efficient local inference on resource-constrained devices such as Raspberry Pi and other ARM-based single-board computers.
π¦ ONNX Model
Included files:
model.onnx
model_quantized.onnx
model.onnx.data
(if sharded)- Configuration files (
config.json
,tokenizer.json
, etc.)
Recommended for: ONNX Runtime, Kleidi AI, and other compatible frameworks.
Quick Start
import onnxruntime as ort
session = ort.InferenceSession("model.onnx")
# ... inference code here ...
The ONNX export enables efficient inference on CPUs, NPUs, and other accelerators, making it ideal for local or edge deployments.
π Credits
- Base model: TinyLlama
- ONNX export: Optimum, ONNX Runtime
- Model optimization: ARM-optimized for Raspberry Pi
Maintainer: Makatia
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support