LLaMA 4 Maverick 17B 128E Instruct (GGUF, Non-Quantized)

This is a non-quantized GGUF conversion of the original meta-llama/Llama-4-Maverick-17B-128E-Instruct model.
It has been converted for compatibility with inference libraries that use the GGUF format, such as llama.cpp, llamacpp-python, llamafile, and Ollama.


Model Details

  • Architecture: LLaMA 4
  • Model Type: Mixture of Experts (MoE) - 128 Experts
  • Parameters: 17 Billion (base), larger when expanded with MoE
  • Quantization: None (float16 precision)
  • Format: .gguf (non-quantized)

Intended Use

  • Research and evaluation purposes
  • Fine-tuning or quantization into lower-bit formats (q4, q5) for efficient inference
  • Deployment on high-memory systems (256 GB RAM or >512 GB VRAM recommended)

Notes

  • This model is extremely large. The unquantized GGUF file size is approximately 801 GB.
  • Running this model requires high-end hardware, preferably with multiple GPUs or extremely large VRAM.
  • A quantized version (q4_k_m, q5_0) will be released separately for easier local inference.

License

This model is distributed under the Meta Llama 4 license and the same terms apply.
Users must have signed the Meta license agreement to access and use this model.


Acknowledgments

  • Original model weights: Meta AI
  • GGUF conversion: Done using llama.cpp tooling.

Disclaimer

This is a direct format conversion.
No fine-tuning, evaluation, or modification has been performed beyond reformatting into GGUF.

Use responsibly and within the bounds of the Meta Llama 4 license.

Downloads last month
459
GGUF
Model size
401B params
Architecture
llama4
Hardware compatibility
Log In to view the estimation

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for AaronimusPrime/llama-4-maverick-17b-128e-instruct-f16-gguf