MetaStone-S1: Reflective Generative AI for Test-Time Scaling

Author: Neural Intelligence Network
Published: Tue 02 Sep 2025
Episode Link: https://podcasters.spotify.com/pod/show/neuralintelpod/episodes/MetaStone-S1-Reflective-Generative-AI-for-Test-Time-Scaling-e37du9k

This document introduces MetaStone-S1, a novel reflective generative model designed for Test-Time Scaling (TTS) in large language models (LLMs). The core innovation is a Reflective Generative Form that unifies the policy model and a Self-supervised Process Reward Model (SPRM) within a single network. This integration allows MetaStone-S1 to efficiently generate and select high-quality reasoning trajectories without relying on expensive, human-annotated process-level data, instead learning from outcome rewards. The research demonstrates that MetaStone-S1, with only 32 billion parameters, achieves performance comparable to OpenAI's o3-mini series across various benchmarks, including mathematics, coding, and Chinese reasoning. The paper also explores the scaling law of these models and identifies an "aha moment" during training where the SPRM begins to effectively distinguish between correct and incorrect reasoning.

Share to:

EachPod

EachPod

MetaStone-S1: Reflective Generative AI for Test-Time Scaling