Models tell you what to discard

Author: Arjun Srivastava
Published: Thu 18 Jul 2024
Episode Link: https://arjunsriva.com/podcast/podcasts/2310.01801/

This paper introduces FastGen, a novel method that uses lightweight model profiling and adaptive key-value caching to significantly reduce memory footprint without noticeable quality loss.

Read full paper: https://arxiv.org/abs/2310.01801

Tags: Systems and Performance, Machine Learning, Optimization

Share to:

EachPod

EachPod

Models tell you what to discard