GPTQ-quantized Gemma3 models
AI & ML interests
None defined yet.
Recent Activity
Models prequantized with [HIGGS](https://arxiv.org/abs/2411.17525) zero-shot quantization. Requires the latest `transformers` to run.
-
Pushing the Limits of Large Language Model Quantization via the Linearity Theorem
Paper • 2411.17525 • Published • 4 -
ISTA-DASLab/Llama-3.3-70B-Instruct-HIGGS-GPTQ-4bit
19B • Updated • 67 • 7 -
ISTA-DASLab/Llama-3.1-8B-Instruct-HIGGS-GPTQ-4bit
Text Generation • 3B • Updated • 15 -
ISTA-DASLab/Llama-3.1-8B-Instruct-HIGGS-GPTQ-3bit
Text Generation • 2B • Updated • 20
AQLM quantized LLMs
-
Extreme Compression of Large Language Models via Additive Quantization
Paper • 2401.06118 • Published • 13 -
ISTA-DASLab/Meta-Llama-3-70B-Instruct-AQLM-2Bit-1x16
Text Generation • 11B • Updated • 78 • 20 -
ISTA-DASLab/Meta-Llama-3-70B-AQLM-2Bit-1x16
Text Generation • 11B • Updated • 40 • 14 -
ISTA-DASLab/Meta-Llama-3-8B-Instruct-AQLM-2Bit-1x16
Text Generation • 2B • Updated • 1.83k • 12
https://arxiv.org/abs/2502.05003
Official AQLM quantizations for "PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression": https://arxiv.org/abs/2405.14852
-
PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression
Paper • 2405.14852 • Published • 2 -
ISTA-DASLab/Meta-Llama-3.1-70B-Instruct-AQLM-PV-2Bit-1x16
Text Generation • 11B • Updated • 25 • 46 -
ISTA-DASLab/Mistral-Nemo-Instruct-2407-AQLM-PV-2Bit-1x16-hf
3B • Updated • 9 • 3 -
ISTA-DASLab/Meta-Llama-3.1-8B-Instruct-AQLM-PV-2Bit-1x16-hf
Text Generation • 2B • Updated • 1.75k • 8
GPTQ-quantized Gemma3 models
https://arxiv.org/abs/2502.05003
Models prequantized with [HIGGS](https://arxiv.org/abs/2411.17525) zero-shot quantization. Requires the latest `transformers` to run.
-
Pushing the Limits of Large Language Model Quantization via the Linearity Theorem
Paper • 2411.17525 • Published • 4 -
ISTA-DASLab/Llama-3.3-70B-Instruct-HIGGS-GPTQ-4bit
19B • Updated • 67 • 7 -
ISTA-DASLab/Llama-3.1-8B-Instruct-HIGGS-GPTQ-4bit
Text Generation • 3B • Updated • 15 -
ISTA-DASLab/Llama-3.1-8B-Instruct-HIGGS-GPTQ-3bit
Text Generation • 2B • Updated • 20
Official AQLM quantizations for "PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression": https://arxiv.org/abs/2405.14852
-
PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression
Paper • 2405.14852 • Published • 2 -
ISTA-DASLab/Meta-Llama-3.1-70B-Instruct-AQLM-PV-2Bit-1x16
Text Generation • 11B • Updated • 25 • 46 -
ISTA-DASLab/Mistral-Nemo-Instruct-2407-AQLM-PV-2Bit-1x16-hf
3B • Updated • 9 • 3 -
ISTA-DASLab/Meta-Llama-3.1-8B-Instruct-AQLM-PV-2Bit-1x16-hf
Text Generation • 2B • Updated • 1.75k • 8
AQLM quantized LLMs
-
Extreme Compression of Large Language Models via Additive Quantization
Paper • 2401.06118 • Published • 13 -
ISTA-DASLab/Meta-Llama-3-70B-Instruct-AQLM-2Bit-1x16
Text Generation • 11B • Updated • 78 • 20 -
ISTA-DASLab/Meta-Llama-3-70B-AQLM-2Bit-1x16
Text Generation • 11B • Updated • 40 • 14 -
ISTA-DASLab/Meta-Llama-3-8B-Instruct-AQLM-2Bit-1x16
Text Generation • 2B • Updated • 1.83k • 12