Efficient Compression of Large Language Models using LLM-Pruner

Author: Arjun Srivastava
Published: Sun 11 Aug 2024
Episode Link: https://arjunsriva.com/podcast/podcasts/2305.11627/

The podcast discusses a paper that introduces LLM-Pruner, a task-agnostic framework for compressing Large Language Models (LLMs) through structural pruning. The framework consists of three stages: Discovery, Estimation, and Recovery, enabling efficient compression without sacrificing model performance.

LLM-Pruner utilizes structural pruning and a post-training method called LoRA to compress LLMs without task-specific retraining. The framework demonstrates promising results in maintaining model performance even with pruning up to 20% of parameters.

Read full paper: https://arxiv.org/abs/2305.11627

Tags: Artificial Intelligence, Natural Language Processing, Model Compression

Share to:

EachPod

EachPod

Efficient Compression of Large Language Models using LLM-Pruner