1. EachPod

Mixture-of-Recursions: Adaptive Computation for Language Models

Author
Neural Intelligence Network
Published
Sat 16 Aug 2025
Episode Link
https://podcasters.spotify.com/pod/show/neuralintelpod/episodes/Mixture-of-Recursions-Adaptive-Computation-for-Language-Models-e36och7

The provided source introduces Mixture-of-Recursions (MoR), a novel Transformer architecture designed to enhance the efficiency of large language models. MoR achieves this by combining parameter sharing and adaptive computation within a Recursive Transformer. It utilizes lightweight routers to dynamically determine the optimal recursion depth for individual tokens, focusing computational effort where it's most needed. Furthermore, MoR integrates efficient Key-Value (KV) caching strategies—recursion-wise caching and recursive sharing—to reduce memory footprint and improve inference throughput. Experiments demonstrate that MoR consistently outperforms existing baselines by achieving comparable or superior performance with fewer parameters and lower computational costs.

Share to: