This document introduces STREAM3R, a novel method for scalable sequential 3D reconstructionfrom streaming input images using a causal Transformer architecture. Unlike prior methods that process fixed image sets or incur redundant computations for continuous streams, STREAM3R efficiently updates 3D geometry and camera poses by caching features from previously observed frames. The research demonstrates that this transformer-based approach achieves competitive or superior performance in tasks like monocular and video depth estimation, 3D reconstruction, and camera pose estimation, even in dynamic environments, while offering faster inference speeds. This method learns geometric priors from large-scale 3D datasets, leading to more generalizable and real-time reconstruction capabilities.