Member-only story

Exploring the Performance of Low-Bit Quantized LLAMA3 Models: An In-Depth Investigation

Emad Dehnavi
2 min readMay 14, 2024

--

I was reading an article called How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study ( By Wei Huang, Xudong Ma, Haotong Qin, Xingyu Zheng, Chengtao Lv, Hong Chen, Jie Luo, Xiaojuan Qi, Xianglong Liu, Michele Magno ) and as someone who is new in this field it was eye-opening.

The article goes on to explore how well LLAMA3 models hold up when quantized to different bit-widths, including post-training quantization and LoRA (Low Rank Adaptation) finetuning quantization, and despite their impressive performance, these models still take a hit when quantized to low bit-widths. We’re talking about noticeable drops in performance, especially with ultra-low bit-widths. It’s a challenge that needs addressing if we want to make LLAMA3 accessible in all sorts of scenarios.

Post-Training Quantization

  • Evaluation results of post-training quantization on LLAMA3–8B model
Evaluation results of post-training quantization on LLAMA3–8B model
  • Evaluation results of post-training quantization on LLAMA3–70B model
Evaluation results of post-training quantization on LLAMA3–70B model

LoRA-FineTuning Quantization

  • LoRA-FT on LLAMA3–8B with Alpaca dataset
LoRA-FT on LLAMA3–8B with Alpaca dataset

This underscores the notable performance gap observed under low bit-widths, signaling the need for further advancements in future developments and this study indeed can serve as a valuable asset in propelling future models forward, driving LLMs towards lower bit-widths while maintaining higher accuracy to ensure practicality.

You can read more and even go through the full script used to evaluate various quantization in their project repository LLaMA3-Quantization

--

--

Emad Dehnavi
Emad Dehnavi

Written by Emad Dehnavi

With 8 years as a software engineer, I write about AI and technology in a simple way. My goal is to make these topics easy and interesting for everyone.

No responses yet

Write a response