Compositional Monte Carlo Tree Diffusion for Extendable Planning

Jaesik Yoon1,2, Hyeonseo Cho1, Sungjin Ahn1,3
1KAIST, 2SAP, 3NYU
NeurIPS 2025, Spotlight (top 3.2% of submissions)

Monte Carlo Tree Diffusion (MCTD) effectively tackles complex planning tasks by combining diffusion models with structured tree search. However, its core limitation is the inability to generate these high-quality plans for longer trajectories than seen during training. We propose Compositional MCTD (C-MCTD), a framework that overcomes this by generating a globally coherent plan from smaller, individually generated trajectory segments.

Abstract

Monte Carlo Tree Diffusion (MCTD) integrates diffusion models with structured tree search to enable effective trajectory exploration through stepwise reasoning. However, MCTD remains fundamentally limited by training trajectory lengths. While periodic replanning allows plan concatenation for longer plan generation, the planning process remains locally confined, as MCTD searches within individual trajectories without access to global context. We propose Compositional Monte Carlo Tree Diffusion (C-MCTD), a framework that elevates planning from individual trajectory optimization to reasoning over complete plan compositions. C-MCTD introduces three complementary components: (1) Online Composer, which performs globally-aware planning by searching across entire plan compositions; (2) Distributed Composer, which reduces search complexity through parallel exploration from multiple starting points; and (3) Preplan Composer, which accelerates inference by leveraging cached plan graphs.

Preliminary: Monte Carlo Tree Diffusion (MCTD)

Our work builds upon Monte Carlo Tree Diffusion (MCTD), a framework that reframes trajectory planning as a tree search problem by integrating the generative power of diffusion models with the structured search capabilities of MCTS. The framework is built on three key concepts:

1. Denoising as Tree Rollout
Unlike conventional MCTS which performs rollouts at the state level, MCTD operates on subplans—segments of the full trajectory. A complete trajectory $\mathbf{x}$ is partitioned into a sequence $[\mathbf{x}_1, \dots, \mathbf{x}_S]$. By generating these semi-autoregressively, MCTD performs intermediate evaluations akin to MCTS rollouts. This approach effectively reduces the search tree's depth while preserving the global coherence of diffusion models. $$ \begin{align} p(\mathbf{x}) \approx \prod_{s=1}^S p(\mathbf{x}_s|\mathbf{x}_{1:s-1}) \end{align} $$

2. Guidance Levels as Meta-Actions
To manage the exploration-exploitation trade-off, MCTD introduces guidance levels as meta-actions. For each subplan $\mathbf{x}_s$, a discrete control mode—such as $\text{GUIDE}$ or $\text{NO_GUIDE}$—is selected. By dynamically controlling the guidance level $g_s$ for each subplan, MCTD can balance exploration and exploitation within a single, unified denoising process. $$ \begin{align} p(\mathbf{x}|\mathbf{g}) \approx \prod_{s=1}^S p(\mathbf{x}_s|\mathbf{x}_{1:s-1}, g_s) \end{align} $$

3. Jumpy Denoising as Fast Simulation
For efficient simulation, MCTD utilizes a fast, jumpy denoising process based on the Denoising Diffusion Implicit Model (DDIM) (Song et al., 2020). This technique significantly accelerates the search process while maintaining high-quality trajectory generation for evaluation. $$ \begin{align} \tilde{\mathbf{x}}_{s+1:S} \sim p(\mathbf{x}_{s+1:S}|\mathbf{x}_{1:s}, \mathbf{g}) \end{align} $$

Four Canonical Steps of MCTD
MCTD implements these components through a modified MCTS procedure. The four canonical steps—Selection, Expansion, Simulation, and Backpropagation—are adapted to handle subplan generation within this diffusion-based planning process.

Compositional Monte Carlo Tree Diffusion

While MCTD performs impressively on complex planning tasks, it struggles to generate plans that are longer than the trajectories seen during training. A common workaround is to simply generate a new plan when the old one ends. However, this "replanning" approach is inherently myopic—it makes decisions based only on the immediate situation, often leading to dead ends or inefficient, suboptimal paths.

To overcome this fundamental limitation, we propose Compositional Monte Carlo Tree Diffusion (C-MCTD). Our framework constructs a globally coherent, long-horizon plan by intelligently composing shorter, high-quality plan segments. This allows C-MCTD to reason about the entire problem and avoid the short-sighted pitfalls of simple replanning.

We developed three distinct variants of C-MCTD, each designed to tackle different scalability challenges:

  • Online Composer (OC): Builds the plan sequentially, using a systematic tree search to ensure each new segment connects coherently with the overall plan.
  • Distributed Composer (DC): Speeds up the search by expanding the plan from multiple points at once, effectively meeting in the middle.
  • Preplan Composer (PC): Achieves maximum efficiency by using a pre-computed "roadmap" of key waypoints to quickly find the optimal path in a graph-based manner.

Online Composer (OC)

Online Composer Overview

The Online Composer (OC) is our foundational approach to solving the myopic planning problem. Instead of simply reacting when a plan ends, OC systematically builds a long-term plan by connecting shorter, high-quality plan segments within a single, coherent tree search. This is achieved through three key components.

Stitching-Based Tree Expansion
To create a plan-level search tree, OC's nodes represent entire, complete plans rather than just partial steps. When the tree expands, OC generates a new plan segment $\mathbf{x}^m$ and stitches it to the end of the parent node's plan $\mathbf{x}^{1:m-1}$. This allows us to construct trajectories that are far longer than the base diffusion planner was trained on, all while using the power of tree search to find a globally optimal solution. $$ \begin{align} p_\theta(\mathbf{x}) \approx \prod_{m=1}^{M} p_\theta(\mathbf{x}^m \mid \mathbf{x}^{1:m-1}) \end{align} $$

Guidance Sets as Meta-Actions
To improve decision-making, we generalize MCTD's "guidance level" into a guidance set $\mathcal{G}$. At each expansion step, instead of committing to a single guidance strategy (e.g., be conservative or be exploratory), OC considers a flexible set of options. This empowers the planner at inference time to select the most effective guidance for generating the next plan segment, leading to more diverse and higher-quality solutions. $$ \begin{align} p(\mathbf{x}|\mathcal{G}) \approx \prod_{m=1}^M p(\mathbf{x}^m|\mathbf{x}^{1:m-1}, \mathcal{G}^m) \end{align} $$

Fast Replanning for Simulation
To keep the search process efficient, OC needs to quickly evaluate the potential of a partial plan. Inspired by "jumpy denoising," it uses fast replanning. Once a new plan segment is added, the remainder of the trajectory is rapidly generated using an accelerated denoising process. This provides a fast, approximate preview of the final outcome, allowing the search algorithm to evaluate rewards without performing a full, computationally expensive simulation. $$ \begin{align} \tilde{\mathbf{x}}^{m+1:M} \sim p(\mathbf{x}^{m+1:M}|\mathbf{x}^{1:m}, \mathcal{G}^m) \end{align} $$

Limitations and the Next Step
While OC effectively extends planning beyond the limits of its training data, its sequential, piece-by-piece approach can become slow in extremely large environments due to the exponential growth of the search space. To address these scalability challenges, we developed our next two variants: Distributed Composer (DC) and Preplan Composer (PC).


Distributed /Preplan Composer Algorithms

Distributed Composer (DC)

While the Online Composer is effective, its single, sequential search can become prohibitively slow in vast, long-horizon planning tasks. To tackle this exponential complexity, we developed the Distributed Composer (DC).

The core idea is to "divide and conquer." Instead of launching one long search from a single start, DC initiates multiple, smaller MCTS searches in parallel from several strategic starting positions (waypoints) within the environment. These parallel searches explore their local areas and then connect with each other to form a global solution.

Guidance-Oriented Parallel Search
To make these parallel searches efficient, we guide each one toward task-relevant objectives $\mathcal{J}_{\phi}(\mathbf{x})$, each search tree from a starting position $s_i$ is biased to expand in promising directions. This ensures that computational effort is focused on explorations that are most likely to contribute to a successful final plan, avoiding wasted effort. $$ \begin{align} \tilde{p}_{\theta}(\mathbf{x}|s_i) \propto p_{\theta}(\mathbf{x}|s_i) \exp(\mathcal{J}_{\phi}(\mathbf{x})), \end{align} $$

Strategic Tree Connection
To connect the parallel searches, DC uses a strategy to connect the trees only when the plan from one tree reaches the starting point of another tree (within a small distance threshold, $\epsilon$). This transforms the problem into building a "roadmap" between key waypoints, dramatically reducing computational overhead.

Efficient Path Synthesis
Once the parallel searches have run and formed a connectivity graph (our "roadmap"), DC uses a classic shortest-path algorithm like Dijkstra's or A* to instantly find the optimal path from the start to the goal. This final step synthesizes the local plans discovered by the parallel searches into a single, globally optimal solution.

Remaining Challenges
By parallelizing the search, DC significantly reduces the search depth and scales far better than OC. However, it still performs all of this computationally intensive tree search at inference time. To address this, we introduce our final and most efficient variant: Preplan Composer (PC).


Preplan Composer (PC)

Our final variant, the Preplan Composer (PC), addresses the last major hurdle: the heavy computational cost of performing tree searches at inference time. While the Distributed Composer parallelizes the search, it still has to do all that work "live." PC introduces a simple yet powerful idea: what if we do all the heavy exploration work beforehand?

PC shifts the expensive process of discovering paths into an offline, one-time pre-computation step. This allows for nearly instantaneous planning when a new task is given. It operates in two distinct phases.

Phase 1 (Offline): Building the Connectivity Graph
In the offline phase, we use our Online Composer to thoroughly explore the environment and build a comprehensive "roadmap" between the waypoints. This roadmap is stored as a graph where the nodes are strategic waypoints and the edges are high-quality, pre-computed plans that connect them. This computationally intensive step only needs to be performed once. $$ \begin{align} \tilde{p}_{\theta}(\mathbf{x}|s_i) \propto p_{\theta}(\mathbf{x}|s_i) \exp(-\text{dist}(\mathbf{x},s_j)), \end{align} $$

Phase 2 (Online): Lightning-Fast Graph-Based Planning
Once a planning task is requested at inference time, the hard work is already done. PC simply connects the given start and goal states to their nearest waypoints in the pre-computed graph. Then, it uses a classical shortest-path finding algorithm (like A*) to find the optimal path along the existing roadmap. No complex tree search is required at inference time.

By pre-computing the solution space, PC makes long-horizon planning incredibly efficient at inference time, making it ideal for applications where speed is critical.

Experimental Results

To demonstrate the effectiveness of C-MCTD, we conducted a comprehensive evaluation using challenging benchmarks from OGBench as done in the original MCTD paper. Our experiments cover three key domains: (1) long-horizon maze navigation, (2) multi-object robot arm manipulation, and (3) high-dimensional visual navigation.


Long-Horizon Maze Navigation 🗺️

First, we tested our methods on maze navigation tasks that require planning trajectories far longer than those seen during training. This experiment highlights how well C-MCTD generalizes beyond its training data.

Maze Navigation Results
  • Baselines Struggle to Scale: As the maze complexity increased (medium → giant), most baseline methods failed completely, with success rates dropping to 0% in the largest "giant" maze.
  • C-MCTD Shows Superior Scalability: In contrast, our C-MCTD variants, especially the Preplan Composer, achieved a perfect 100% success rate on the most difficult PointMaze-Giant task, demonstrating outstanding scalability.
  • Efficiency Gains: The Preplan Composer was also the fastest planner in C-MCTD and produced the highest-quality (shortest) paths, clearly showing the benefits of its offline pre-planning strategy.

Multi-Object Manipulation 🤖

Next, we evaluated C-MCTD on a robotic arm manipulation task requiring it to stack multiple cubes in a specific order, a test of precise sequential planning.

Multi-Object Manipulation Results
  • Baselines Degrade with Complexity: As the number of cubes increased, the success rates of baseline methods dropped significantly (e.g., MCTD-Replan fell from 100% to 24%).
  • C-MCTD Excels at Compositional Reasoning: Our Online Composer, enhanced with a Plan Cache (to avoid re-planning the same plan multiple times), maintained a high 82% success rate on the most difficult four-cube task, proving its ability to solve complex compositional problems.

High-Dimensional Visual Navigation 👀

Finally, we applied C-MCTD to a challenging visual navigation task where the agent must find its way from raw RGB image observations. This partially observable setting mimics real-world scenarios.

High-Dimensional Visual Navigation Results
  • Myopic Planning Fails: In this difficult setting, the limitations of myopic (short-sighted) planning were clear, with baselines performing very poorly on the more complex maze.
  • C-MCTD's Plan-Level Search Succeeds: Our Online Composer significantly outperformed all baselines, achieving a 54% success rate on the large maze (vs. 20% for the best baseline), highlighting the benefit of reasoning over the entire plan.
  • Future Research Direction: Interestingly, the Distributed Composer and Preplan Composer variants underperformed here. This suggests that identifying meaningful waypoints in high-dimensional visual space is a key challenge and an important direction for future work.

Conclusion

We introduced Compositional Monte Carlo Tree Diffusion (C-MCTD), a novel framework that scales diffusion-based planners to generate complex, long-horizon plans by composing shorter trajectories at inference time. C-MCTD integrates three complementary approaches: the Online Composer for flexible, on-the-fly plan generation; the Distributed Composer for scaling through parallelism; and the Preplan Composer for maximum efficiency via pre-computed plan graphs. Our extensive evaluations show that this compositional framework significantly outperforms conventional replanning strategies. Notably, the Preplan Composer solves tasks requiring trajectories up to $10\times$ longer than those seen during training. These results demonstrate that compositional scaling at inference time can dramatically enhance the reasoning capabilities of diffusion planners. Crucially, this is achieved without any model retraining, offering a practical and effective path toward more capable agents for challenging sequential decision-making problems.

BibTeX


        @article{yoon2025compositional,
          title={Compositional Monte Carlo Tree Diffusion for Extendable Planning},
          author={Yoon, Jaesik and Cho, Hyeonseo and Ahn, Sungjin},
          journal={arXiv preprint arXiv:2510.21361},
          year={2025}
        }