Llama 3.1 405B Quants and llama.cpp versions that is used for quantization

  • IQ1_S: 86.8 GB - b3459
  • IQ1_M: 95.1 GB - b3459
  • IQ2_XXS: 109.0 GB - b3459
  • IQ3_XXS: 157.7 GB - b3484

Quantization from BF16 here: https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/

which is converted from Llama 3.1 405B: https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct

imatrix file https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/blob/main/405imatrix.dat

Lmk if you need bigger quants.

Sponsored by: https://pickabrain.ai

Downloads last month
107
GGUF
Model size
410B params
Architecture
llama

1-bit

2-bit

3-bit

Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for etemiz/Llama-3.1-405B-Inst-GGUF

Quantized
(29)
this model