Mistral-Nemo Instruct 2407 β ONNX FP32 Export
This repository contains the ONNX-formatted FP32 export of the Mistral-Nemo Instruct 2407 model, compatible with ONNX Runtime.
π§ Model Summary
This is the flagship release of the Alex AI project β and to our knowledge, the first-ever open ONNX-format export of Mistral-Nemo Instruct 2407 for full-stack experimentation and deployment.
- Architecture: Mistral-Transformer hybrid, instruction-tuned for reasoning and alignment
- Format: ONNX (graph + external weights)
- Precision: FP32 (float32)
- Exported Using: PyTorch β ONNX via
torch.onnx.export
This model forms the foundation for future research in quantization, NPU acceleration, memory-routing, and lightweight agent design. It is being positioned as a clean and transparent baseline for community optimization β with future support for AMD Vitis AI, Olive, and quantized variants.
π Files Included
File | Description |
---|---|
model.onnx |
The model graph |
model.onnx.data |
External tensor weights (~27GB) |
config.json |
Model configuration metadata |
requirements.txt |
Runtime dependencies |
LICENSE |
Apache 2.0 License |
β Requirements
Install the required packages:
pip install -r requirements.txt
π Usage Example
import onnxruntime as ort
import numpy as np
session = ort.InferenceSession("model.onnx", providers=["CPUExecutionProvider"])
input_ids = np.array([[0, 1, 2, 3, 4]], dtype=np.int64)
attention_mask = np.ones_like(input_ids)
outputs = session.run(None, {
"input_ids": input_ids,
"attention_mask": attention_mask
})
print(outputs[0].shape) # (1, 5, vocab_size)
π‘ Project Vision
The Alex AI project was created to explore whatβs possible when we combine precision reasoning, self-evolving memory, and strict efficiency β all under real-world constraints.
This model is a public cornerstone for research in ONNX deployment, quantization, agent routing, and modular NPU workflows. It is open, transparent, and designed for practical extension.
We believe high-quality tools shouldnβt be locked behind paywalls.
π€ Get Involved
- GitHub: github.com/techAInewb
- Hugging Face: huggingface.co/techAInewb
- Project Chat: Coming soon
Contributions, forks, and optimization experiments are welcome!
πΈ Support the Project
If youβd like to support open-source AI development, please consider donating:
π«Ά Donate via PayPal
Message: "Thank you for your donation to the Alex AI project!"
π License
This model is released under the Apache 2.0 License.
π§ͺ Inference Validation
This model has been validated using ONNX Runtime in a local Windows 11 environment:
- System: AMD Ryzen 5 7640HS, 16GB RAM, RTX 3050 (6GB), Windows 11 Home
- Runtime:
onnxruntime==1.17.0
, Python 3.10, Conda environmentalex-dev
Test inference was run with:
input_ids = np.array([[0, 1, 2, 3, 4]], dtype=np.int64)
attention_mask = np.ones_like(input_ids)
Result:
- β Model loaded and executed without error
- β
Output logits shape:
(1, 5, 131072)
- β οΈ Memory usage may exceed 20GB for full batch sizes β ensure pagefile is set appropriately (we used 350GB)
- π« No GPU or CUDA acceleration used for this test β CPU-only validation
This confirms that full ONNX FP32 export is working and stable, even under real-world hardware constraints.