LLaMA 4 Maverick 17B 128E Instruct (GGUF, Non-Quantized)
This is a non-quantized GGUF conversion of the original meta-llama/Llama-4-Maverick-17B-128E-Instruct
model.
It has been converted for compatibility with inference libraries that use the GGUF format, such as llama.cpp
, llamacpp-python
, llamafile
, and Ollama
.
Model Details
- Architecture: LLaMA 4
- Model Type: Mixture of Experts (MoE) - 128 Experts
- Parameters: 17 Billion (base), larger when expanded with MoE
- Quantization: None (float16 precision)
- Format:
.gguf
(non-quantized)
Intended Use
- Research and evaluation purposes
- Fine-tuning or quantization into lower-bit formats (q4, q5) for efficient inference
- Deployment on high-memory systems (256 GB RAM or >512 GB VRAM recommended)
Notes
- This model is extremely large. The unquantized GGUF file size is approximately 801 GB.
- Running this model requires high-end hardware, preferably with multiple GPUs or extremely large VRAM.
- A quantized version (q4_k_m, q5_0) will be released separately for easier local inference.
License
This model is distributed under the Meta Llama 4 license and the same terms apply.
Users must have signed the Meta license agreement to access and use this model.
Acknowledgments
Disclaimer
This is a direct format conversion.
No fine-tuning, evaluation, or modification has been performed beyond reformatting into GGUF.
Use responsibly and within the bounds of the Meta Llama 4 license.
- Downloads last month
- 459
Hardware compatibility
Log In
to view the estimation
16-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for AaronimusPrime/llama-4-maverick-17b-128e-instruct-f16-gguf
Base model
meta-llama/Llama-4-Maverick-17B-128E