The provided source introduces Mixture-of-Recursions (MoR), a novel Transformer architecture designed to enhance the efficiency of large language models. MoR achieves this by combining parameter sharing and adaptive computation within a Recursive Transformer. It utilizes lightweight routers to dynamically determine the optimal recursion depth for individual tokens, focusing computational effort where it's most needed. Furthermore, MoR integrates efficient Key-Value (KV) caching strategies—recursion-wise caching and recursive sharing—to reduce memory footprint and improve inference throughput. Experiments demonstrate that MoR consistently outperforms existing baselines by achieving comparable or superior performance with fewer parameters and lower computational costs.