arxiv:2508.03332

Exploring Layer-wise Information Effectiveness for Post-Training Quantization in Small Language Models

Published on Aug 5

Authors:

Abstract

LieQ, a post-training quantization framework, achieves high accuracy in sub-7B models under extreme low-bit compression, outperforming existing methods and enabling deployment on edge devices.

AI-generated summary

Large language models with billions of parameters are often over-provisioned: many layers contribute little unique information yet dominate the memory and energy footprint during inference. We present LieQ, a metric-driven post-training quantization framework that addresses the critical challenge of maintaining accuracy in sub-7B models under extreme low-bit compression. Our method introduces three complementary layer-wise diagnostics-Perplexity Drop, Representational Compactness, and Top-k Energy Gain -that reveal a canonical division of labour across layers, enabling automatic bit-width allocation without gradient updates. Unlike existing approaches that suffer severe accuracy degradation at 2-3 bits precision, LieQ achieves state-of-the-art compression-accuracy trade-offs: on Qwen3-4B, it recovers 95.9% of FP16 baseline performance at 2.05-bit quantization, outperforming GPTQ by 19.7% and AWQ by 18.1% on average across seven zero-shot reasoning tasks. Applied to LLaMA3.2-3B, LieQ maintains 98.2% of baseline accuracy at 2.07-bit precision while enabling 4x memory reduction, establishing new paradigms for deploying small language models on resource-constrained edge devices.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2508.03332 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2508.03332 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2508.03332 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.