Generative Recursive Reasoning
Junyeob Baek, Mingyu Jo, Minsu Kim, Mengye Ren, Yoshua Bengio, Sungjin Ahn
KAIST, Mila – Québec AI Institute, NYU, Université de Montréal
TL;DR
"GRAM turns recursive latent reasoning from a deterministic single-track update into a probabilistic, multi-trajectory search. By injecting state-dependent stochastic guidance, GRAM can maintain multiple hypotheses, solve complex multi-solution constraint tasks, and scale inference-time performance across both depth and width."
Why This Matters
Traditional Large Language Models (LLMs) scale their reasoning capacity by generating longer sequences of text (e.g., Chain-of-Thought tokens). However, this sequence-generation process is computationally expensive and binds reasoning depth directly to output length.
Recursive Reasoning Models (RRMs)Neural architectures that perform iterative latent-state refinement using shared transition functions instead of appending tokens. offer a elegant alternative: they repeatedly apply a shared neural network block to refine a persistent internal latent stateA continuous vector representation holding the model's internal beliefs or reasoning path..
But existing RRMs have a fatal flaw: they are completely deterministic. Given an input, they trace a single, unyielding path in the latent space. If they make an early mistake or encounter a problem with multiple valid answers (like Sudoku or Graph Coloring), they get trapped in local minima and fail.
The Big Idea
Generative Recursive reAsoning Models (GRAM) convert recursive latent reasoning into a probabilistic, multi-trajectory computation.
Instead of updating the latent state deterministically, GRAM models the reasoning trajectory as a stochastic process. At each step, it samples a state transition from a learned distribution. This allows the model to:
- Maintain uncertainty: Explore multiple parallel hypotheses at once.
- Scale with width: Generate multiple independent reasoning paths and select the best using a Latent Process Reward Model (LPRM).
- Generate unconditionally: Act as a generative model to produce complex structured outputs (like valid Sudoku boards) from scratch.
How It Works
GRAM organizes its recursive computation into two nested loops:
- Inner Loop (Deterministic Refinement): A low-level state $l_t$ is refined $K$ times to perform fine-grained intermediate computation.
- Outer Loop (Stochastic Update): A high-level state $h_t$ is updated stochastically by adding a state-dependent residual perturbation $\epsilon_t$ to a deterministic proposal $u_t$.
The transition is mathematically formulated as: $$u_t = f_H(h_{t-1}, l_t)$$ $$\epsilon_t \sim p_\theta(\epsilon_t | u_t) := \mathcal{N}(\mu_\theta(u_t), \sigma_\theta^2(u_t)I)$$ $$h_t = u_t + \epsilon_t$$
Interactive Architecture Explorer
Hover over or click on different components of the GRAM cell to see how information flows and where stochasticity is injected.
Select a Component
Click or hover over any block in the diagram to explore its mathematical formulation and role in the Generative Recursive Reasoning architecture.
Interactive Simulation
See the difference between deterministic RRMs and GRAM's stochastic multi-trajectory exploration in real-time.
1. Latent Trajectory Simulator
Watch how deterministic paths collapse into a suboptimal local minimum, while GRAM explores the space to find the global optimum.
2. Multi-Solution Explorer (4x4 N-Queens)
Deterministic models always find the same solution or collapse. GRAM's stochastic transitions explore different paths to find all valid configurations.
Active Trajectory Beliefs
How it works here: A 4x4 N-Queens puzzle has exactly 2 valid solutions. A deterministic model will always select either Solution 1 or Solution 2 based on initialization, leaving 50% of the solution space permanently hidden.
By clicking "Sample GRAM Path", you simulate a stochastic run that branches dynamically, allowing the model to recover both solutions over multiple parallel runs.
Headline Results
GRAM was evaluated on highly demanding structured reasoning benchmarks, easily outperforming previous state-of-the-art deterministic recursive models.
Benchmark Accuracy Comparison
Note: Large reasoning models (such as Deepseek-R1) are included as external difficulty benchmarks rather than direct baselines.
Limitations & Open Questions
While GRAM represents a massive leap forward for recursive latent reasoning architectures, the authors highlight several key open challenges:
- Training Efficiency Bottleneck: Unlike standard Transformers that can be trained with highly parallelized teacher forcing, GRAM's deep supervision requires sequential backpropagation through time (or truncated steps), limiting training speed.
- Scaling to Foundation Models: Due to the training limitations, scaling GRAM to multi-billion parameter sizes remains a significant engineering hurdle.
- Safety & Verification: Because GRAM generates plausible reasoning paths stochastically, there is a risk of generating convincing-looking but structurally invalid solutions if the latent reward model (LPRM) fails to filter them.
Glossary
Recursive Reasoning Models (RRMs)
Neural network architectures that perform iterative refinement of a single persistent latent state using shared parameters, decoupling computation depth from parameter scale.
Latent Process Reward Model (LPRM)
A value head trained to predict the final correctness of a reasoning trajectory directly from its intermediate latent state, enabling efficient width-based scaling at inference time.
Evidence Lower Bound (ELBO)
A standard objective function used in variational inference to train probabilistic latent-variable models by maximizing a lower bound on the true data likelihood.
Cite this work
@article{baek2026generative,
title={Generative Recursive Reasoning},
author={Baek, Junyeob and Jo, Mingyu and Kim, Minsu and Ren, Mengye and Bengio, Yoshua and Ahn, Sungjin},
journal={arXiv preprint arXiv:2605.19376},
year={2026}
}