Diffusion models have recently emerged as a powerful tool for planning. However, unlike Monte Carlo Tree Search (MCTS)—whose performance naturally improves with inference-time computation scaling—standard diffusion‐based planners offer only limited avenues for the scalability. In this paper, we introduce Monte Carlo Tree Diffusion (MCTD), a novel framework that integrates the generative strength of diffusion models with the adaptive search capabilities of MCTS. Our method reconceptualizes denoising as a tree‐structured process, allowing partially denoised plans to be iteratively evaluated, pruned, and refined. By selectively expanding promising trajectories while retaining the flexibility to revisit and improve suboptimal branches, MCTD achieves the benefits of MCTS such as controlling exploration-exploitation trade-offs within the diffusion framework. Empirical results on challenging long‐horizon tasks show that MCTD outperforms diffusion baselines, yielding higher‐quality solutions as inference-time computation increases.
Monte Carlo Tree Diffusion consists of the three key concepts to enable the tree-structured planning. (1) Denoising as Tree-Rollout, (2) Guidance Levels as meta-Actions, and (3) Jumpy Denoising as Simulation. These innovations form the foundation of MCTD, bridging the gap between traditional tree search methods and diffusion-based planning.
Unlike a standard Diffusion model, our approach partitions the trajectory $\mathbf{x}$ into $S$ subplans, $\mathbf{x}=[\mathbf{x}_1,\mathbf{x}_2,\dots,\mathbf{x}_S]$, and assigns a separate denoising schedule to each subplan, $\mathbf{x}_s$-faster for the earlier segments and slower for the later ones. This makes the process causal and semi-autoregressive, so that the future is denoised based on the already refined past.
$$
\begin{align}
p(\mathbf{x}) \approx \prod_{s=1}^S p(\mathbf{x}_s|\mathbf{x}_{1:s-1})
\end{align}
$$
Different from the standard Monte Carlo Tree Search over state space, MCTD operates at a higher level of abstraction, subplan, to enable more efficient and scalable tree search.
To search the tree over the subplan space, MCTD uses guidance levels as meta-actions to control the exploration-exploitation trade-off.
For instance, let's consider the two guidance levels: $\text{GUIDE}$ and $\text{NO_GUIDE}$. Sampling from the guidance level $\text{NO_GUIDE}$ is the sampling from the prior distribution $p(\mathbf{x})$ trained from the offline data. This corresponds to the exploratory behavior. Conversely, sampling from the guidance level $\text{GUIDE}$ is the sampling from the goal-directed distribution $p_g(\mathbf{x})$, such as through classifier-guided diffusion (Dhariwal & Nichol, 2021), which represents the exploitative behavior.
By dynamically adjusting the guidance levels $g_s$ for each subplan $\mathbf{x}_s$, MCTD can balance the exploration-exploitation trade-off at the level of subplans within a single denoising process.
$$
p(\mathbf{x}|\mathbf{g}) \approx \prod_{s=1}^S p(\mathbf{x}_s|\mathbf{x}_{1:s-1}, g_s)
$$
As a result, this approach enables efficient and scalable planning, even in complex or continuous action spaces.
To simulate the tree-rollout process efficiently, MCTD employs the fast jumpy denoising process based on the Denoising Diffusion Implicit Model (DDIM) (Song et al., 2020). Specifically, when the tree-rollout denoising process has progressed up to the \( s \)-th subplan, the remaining steps are denoised quickly by skipping every $C$ step:
$$
\tilde{\mathbf{x}}_{s+1:S} \sim p(\mathbf{x}_{s+1:S}|\mathbf{x}_{1:s}, \mathbf{g}).
$$
This produces a full trajectory \( \tilde{\mathbf{x}} = (\mathbf{x}_{1:s}, \tilde{\mathbf{x}}_{s+1:S})\), which is then evaluated using the reward function \( r(\tilde{\mathbf{x}}) \). While this fast denoising process may introduce larger approximation errors, it is highly computationally efficient, making it well-suited for the simulation step in MCTD.
Our framework instantiates these three components through a modified MCTS approach, where the four canonical steps are adapted to handle subplan generation within the diffusion-based planning process.
MCTD outperforms the previous diffusion-based planners on diverse planning tasks such as long-horizon maze, robot arm manipulation, and partially observable & visual long-horizon maze tasks.
As inference-time computation budget increases, MCTD shows its scalability with consistently achieving better performances.
Diffusion Forcing MCTD
We introduced Monte Carlo Tree Diffusion (MCTD), a framework designed to combine the best of both worlds: the structured search of Monte Carlo Tree Search and the generative flexibility of diffusion planning to enhance the inference-time scalability of System 2 planning. MCTD leverages meta-actions for adaptive exploration-exploitation, tree-structured denoising for efficient diffusion-based expansion, and fast jumpy denoising for rapid simulation. Experimental results demonstrate that MCTD outperforms existing approaches in various planning tasks, achieving superior scalability and solution quality. Future work will explore adaptive compute allocation, learning-based meta-action selection, and reward shaping to further enhance performance, paving the way for more scalable and flexible System 2 planning.
@inproceedings{mctd,
title={Monte Carlo Tree Diffusion for System 2 Planning},
author={Yoon, Jaesik and Cho, Hyeonseo and Baek, Doojin and Bengio, Yoshua and Ahn, Sungjin},
booktitle={International Conference on Machine Learning},
year={2025},
}