Mixture-of-Recursions: Adaptive Computation for Language Models

Author: Neural Intelligence Network
Published: Sat 16 Aug 2025
Episode Link: https://podcasters.spotify.com/pod/show/neuralintelpod/episodes/Mixture-of-Recursions-Adaptive-Computation-for-Language-Models-e36och7

The provided source introduces Mixture-of-Recursions (MoR), a novel Transformer architecture designed to enhance the efficiency of large language models. MoR achieves this by combining parameter sharing and adaptive computation within a Recursive Transformer. It utilizes lightweight routers to dynamically determine the optimal recursion depth for individual tokens, focusing computational effort where it's most needed. Furthermore, MoR integrates efficient Key-Value (KV) caching strategies—recursion-wise caching and recursive sharing—to reduce memory footprint and improve inference throughput. Experiments demonstrate that MoR consistently outperforms existing baselines by achieving comparable or superior performance with fewer parameters and lower computational costs.

Share to:

EachPod

EachPod

Mixture-of-Recursions: Adaptive Computation for Language Models