AutoRound has demonstrated strong results even at 2-bit precision for VLM models like QWEN2-VL-72B. Check it out here: OPEA/Qwen2-VL-72B-Instruct-int2-sym-inc.
This week, OPEA Space released several new INT4 models, including: nvidia/Llama-3.1-Nemotron-70B-Instruct-HF allenai/OLMo-2-1124-13B-Instruct THUDM/glm-4v-9b AIDC-AI/Marco-o1 and several others. Let us know which models you'd like prioritized for quantization, and we'll do our best to make it happen!
OPEA space just releases nearly 20 int4 models, for example, QWQ-32B-Preview, Llama-3.2-11B-Vision-Instruct, Qwen2.5, Llama3.1, etc. Check out https://huggingface.co/OPEA
- Pre-training code with nanotron - Evaluation suite with lighteval - Synthetic data generation using distilabel (powers our new SFT dataset HuggingFaceTB/smoltalk) - Post-training scripts with TRL & the alignment handbook - On-device tools with llama.cpp for summarization, rewriting & agents
Apache 2.0 licensed. V2 pre-training data mix coming soon!
Try to find a better int4 algorithm for LLAMA3.1? For the 8B model, AutoRound boasts an average improvement across 10 zero-shot tasks, scoring 63.93 versus 63.15 (AWQ). Notably, on the MMLU task, it achieved 66.72 compared to 65.25, and on the ARC-C task, it scored 52.13 against 50.94. For further details and comparisons, visit the leaderboard at Intel/low_bit_open_llm_leaderboard.