Standard
AWQ (Activation-aware Weight Quantization) is a hardware-friendly technique for compressing large language models (LLMs) to enable efficient, low-bit inference on edge devices. Unlike traditional methods that treat all weights equally, AWQ identifies and protects the most important weight channels by analyzing activation distributions, significantly reducing quantization error while avoiding costly mixed-precision computations. Requiring no backpropagation or retraining, AWQ generalizes well across tasks and domains, including instruction-tuned and multi-modal models.
|
|
identifier | AWQ |
Publisher | MIT HAN Lab |
Author | |
URL | |
title | Activation-aware Weight Quantization for LLM Compression and Acceleration |
version | |