Mistral-Nemo Instruct 2407 β€” ONNX FP32 Export

This repository contains the ONNX-formatted FP32 export of the Mistral-Nemo Instruct 2407 model, compatible with ONNX Runtime.

🧠 Model Summary

This is the flagship release of the Alex AI project β€” and to our knowledge, the first-ever open ONNX-format export of Mistral-Nemo Instruct 2407 for full-stack experimentation and deployment.

  • Architecture: Mistral-Transformer hybrid, instruction-tuned for reasoning and alignment
  • Format: ONNX (graph + external weights)
  • Precision: FP32 (float32)
  • Exported Using: PyTorch β†’ ONNX via torch.onnx.export

This model forms the foundation for future research in quantization, NPU acceleration, memory-routing, and lightweight agent design. It is being positioned as a clean and transparent baseline for community optimization β€” with future support for AMD Vitis AI, Olive, and quantized variants.

πŸ“ Files Included

File Description
model.onnx The model graph
model.onnx.data External tensor weights (~27GB)
config.json Model configuration metadata
requirements.txt Runtime dependencies
LICENSE Apache 2.0 License

βœ… Requirements

Install the required packages:

pip install -r requirements.txt

πŸš€ Usage Example

import onnxruntime as ort
import numpy as np

session = ort.InferenceSession("model.onnx", providers=["CPUExecutionProvider"])
input_ids = np.array([[0, 1, 2, 3, 4]], dtype=np.int64)
attention_mask = np.ones_like(input_ids)

outputs = session.run(None, {
    "input_ids": input_ids,
    "attention_mask": attention_mask
})

print(outputs[0].shape)  # (1, 5, vocab_size)

πŸ’‘ Project Vision

The Alex AI project was created to explore what’s possible when we combine precision reasoning, self-evolving memory, and strict efficiency β€” all under real-world constraints.

This model is a public cornerstone for research in ONNX deployment, quantization, agent routing, and modular NPU workflows. It is open, transparent, and designed for practical extension.

We believe high-quality tools shouldn’t be locked behind paywalls.

🀝 Get Involved

Contributions, forks, and optimization experiments are welcome!

πŸ’Έ Support the Project

If you’d like to support open-source AI development, please consider donating:

🫢 Donate via PayPal
Message: "Thank you for your donation to the Alex AI project!"

πŸ“œ License

This model is released under the Apache 2.0 License.

πŸ§ͺ Inference Validation

This model has been validated using ONNX Runtime in a local Windows 11 environment:

  • System: AMD Ryzen 5 7640HS, 16GB RAM, RTX 3050 (6GB), Windows 11 Home
  • Runtime: onnxruntime==1.17.0, Python 3.10, Conda environment alex-dev

Test inference was run with:

input_ids = np.array([[0, 1, 2, 3, 4]], dtype=np.int64)
attention_mask = np.ones_like(input_ids)

Result:

  • βœ… Model loaded and executed without error
  • βœ… Output logits shape: (1, 5, 131072)
  • ⚠️ Memory usage may exceed 20GB for full batch sizes β€” ensure pagefile is set appropriately (we used 350GB)
  • 🚫 No GPU or CUDA acceleration used for this test β€” CPU-only validation

This confirms that full ONNX FP32 export is working and stable, even under real-world hardware constraints.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support