This document introduces STREAM3R, a novel method for scalable sequential 3D reconstruction using a causal Transformer, designed to process streaming image data for on-the-fly updates. Unlike previous approaches that process fixed image sets or struggle with long video sequences due to computational redundancies and limited memory, STREAM3R leverages uni-directional causal attention and a KV-Cache to efficiently integrate new frames with prior reconstructions. The method predicts dense 3D pointmaps and camera poses in both local and global coordinate systems, demonstrating competitive or superior performance across various benchmarks for monocular and video depth estimation, 3D reconstruction, and camera pose estimation. The paper also highlights STREAM3R's faster training speed and improved convergence compared to existing RNN-based architectures.