GRASP: Making Long-Horizon Planning Practical with Gradient-Based World Models

Introduction

Large, learned world models are increasingly capable of predicting long sequences of future observations in high-dimensional visual spaces, generalizing across tasks in ways that were unimaginable a few years ago. However, having a powerful predictive model is not the same as using it effectively for control, learning, or planning. Long-horizon planning with modern world models remains fragile: optimization becomes ill-conditioned, non-greedy structure creates bad local minima, and high-dimensional latent spaces introduce subtle failure modes. This article introduces GRASP, a gradient-based planner that addresses these challenges by making long-horizon planning practical through three key innovations.

GRASP: Making Long-Horizon Planning Practical with Gradient-Based World Models — Source: bair.berkeley.edu

What is a World Model?

Today, the term world model is overloaded. It can refer to an explicit dynamics model or an implicit internal state that a generative model relies on. For our purposes, a world model is a learned model that, given the current state and a sequence of future actions, predicts what will happen next. Formally, it defines a predictive distribution over observed states and actions, approximating the environment's dynamics. These models are becoming general-purpose simulators, but leveraging them for planning requires overcoming significant optimization hurdles.

The Challenge of Long-Horizon Planning

When planning over many time steps, gradient-based methods face several obstacles. First, the optimization landscape becomes ill-conditioned, making it hard to find good solutions. Second, non-greedy structures lead to poor local minima that trap the optimizer. Third, high-dimensional latent spaces—common in vision-based models—cause gradients to propagate through brittle state-input pathways, resulting in noisy or vanishing signals. These issues are amplified as the planning horizon extends, making naive gradient descent impractical.

GRASP: A Robust Gradient-Based Planner

GRASP tackles these problems with three core ideas, each addressing a specific weakness of existing approaches.

Lifting Trajectories into Virtual States

Instead of optimizing actions one step at a time, GRASP lifts the entire trajectory into a set of virtual states. This allows optimization to be parallelized across time, significantly accelerating computation and improving gradient flow. By treating each time step as an independent optimization variable, the planner avoids sequential dependency issues that cause long-horizon planning to become intractable.

Adding Stochasticity for Exploration

To escape poor local minima, GRASP injects stochasticity directly into the state iterates during optimization. This noise acts as a form of exploration, allowing the planner to sample diverse trajectories and avoid getting stuck in suboptimal regions. The stochasticity is carefully balanced to maintain stability while promoting discovery of better solutions.

Reshaping Gradients for Clean Signals

One of the biggest bottlenecks in gradient-based planning is the gradient signal flowing from high-dimensional observations (like images) to actions. GRASP reshapes these gradients to avoid the brittle state-input pathways that plague vision models. By decoupling the gradient computation from the raw observation model, actions receive clean, useful signals that guide optimization effectively.

Conclusion

GRASP demonstrates that gradient-based planning can be made robust for long horizons through careful design. By combining virtual state lifting, stochastic exploration, and gradient reshaping, it overcomes the fragility that has limited previous methods. This work, done with Mike Rabbat, Aditi Krishnapriyan, Yann LeCun, and Amir Bar, opens the door to more effective use of powerful world models in planning and control tasks. Future directions include extending these ideas to even longer horizons and more complex environments.