TinyLlama-1.1B-Chat-v1.0 (ONNX): Local LLM Model Repository

This repository contains quantized ONNX exports of TinyLlama/TinyLlama-1.1B-Chat-v1.0, optimized for efficient local inference on resource-constrained devices such as Raspberry Pi and other ARM-based single-board computers.


🟦 ONNX Model

Included files:

  • model.onnx
  • model_quantized.onnx
  • model.onnx.data (if sharded)
  • Configuration files (config.json, tokenizer.json, etc.)

Recommended for: ONNX Runtime, Kleidi AI, and other compatible frameworks.

Quick Start

import onnxruntime as ort

session = ort.InferenceSession("model.onnx")
# ... inference code here ...

The ONNX export enables efficient inference on CPUs, NPUs, and other accelerators, making it ideal for local or edge deployments.


πŸ“‹ Credits


Maintainer: Makatia

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support