Post
1786
Excited to announce the release of our high-quality Llama-3.1 8B 4-bit HQQ/calibrated quantized model! Achieving an impressive 99.3% relative performance to FP16, it also delivers the fastest inference speed for transformers.
mobiuslabsgmbh/Llama-3.1-8b-instruct_4bitgs64_hqq_calib
mobiuslabsgmbh/Llama-3.1-8b-instruct_4bitgs64_hqq_calib