The trajectory planner is the decision-making core of an autonomous driving system. Its task: given the current scene, output a future trajectory that is safe, comfortable, and efficient. Most production systems today use some form of regression — minimizing the distance between predicted and ground-truth trajectories. Yet a growing body of research and engineering evidence suggests this approach has a fundamental flaw: it assumes the feasible set is convex when it is emphatically not. This article lays out the first-principles argument for why generative approaches (diffusion, autoregressive) are not merely improvements but necessary paradigm shifts.
1. The Non-Convexity of the Feasible Set
A set is convex if for any two points , every point on the line segment connecting them also belongs to . In driving, this property fails dramatically:
Trajectory A goes left around the obstacle; trajectory B goes right. Both are valid. Their average drives straight into the obstacle — infeasible. The feasible set is not convex, and no amount of regularization changes this geometric fact.
2. Why Regression Fails: MSE Averages Modes
Regression with MSE loss minimizes:
When the data distribution is multimodal (e.g., left detour and right detour are both common), the optimal MSE predictor outputs the conditional mean:
This is not a bug in training — it is the mathematically correct solution to the wrong objective. The regression objective assumes a unimodal distribution centered on the mean, which is provably incorrect for non-convex feasible sets.
The MSE mean lands in the valley between two modes — a region of low probability density. The model outputs a trajectory that no human driver would ever take.
3. GMM: A Patch, Not a Solution
Gaussian Mixture Models (GMM) with components attempt to address multimodality by learning means. Each component’s update is still the weighted average of samples assigned to that component:
This creates two problems:
- Spurious peaks: When two true modes are close, their Gaussian components can overlap and produce a false peak in the valley between them.
- Finite approximation: Gaussians are finite convex building blocks. A non-convex shape can never be perfectly tiled by convex pieces. There will always be “gaps” (non-zero probability where there should be none) and “dead corners” (insufficient to cover all modes).
GMM is a patch, not a solution. It uses a finite number of simple convex building blocks to approximate a complex non-convex shape. The approximation error is structural, not parametric — it cannot be fixed by increasing training data or tuning hyperparameters.
4. The Penalty Loss Illusion
A common engineering practice is to add penalty terms (collision, off-road, comfort) on top of MSE loss:
This is equivalent to converting hard constraints into soft penalties via Lagrange multipliers. The approach is valid only when the optimization problem is convex. On a non-convex landscape, gradient descent from the MSE initialization can get trapped in local minima, and the penalty terms merely push the solution toward the nearest feasible boundary rather than the globally optimal trajectory.
The classical EM (Expectation Maximization) planner understood this well. It decomposed the problem into two stages:
- Step A (Path Decider): Choose a corridor (e.g., “go left”), cutting the non-convex space into a convex sub-region.
- Step B (Speed Optimizer): Solve a Quadratic Program (QP) within this convex sub-region to obtain a smooth trajectory.
The key insight: first find a convex sub-problem, then solve it. End-to-end regression skips Step A entirely, attempting to solve the non-convex problem in one shot.
5. Generative Models: Learning the Non-Convex Shape
Generative approaches take a fundamentally different path:
| Method | How it handles non-convexity |
|---|---|
| Diffusion | Directly learns the shape of the non-convex distribution via gradient/flow field |
| Autoregressive | Decomposes the joint distribution via chain rule into conditional distributions; converts a geometric problem into a sequential decision problem |
5.1 Diffusion: Learning the Contour
A diffusion model learns the score function , which points toward higher-density regions at every point in trajectory space. During sampling, it follows this gradient field from noise to data, naturally navigating around infeasible regions:
The score field naturally pushes samples away from infeasible regions (zero density) and toward high-density modes.
5.2 Autoregressive: Sequential Decision Decomposition
The autoregressive approach applies the chain rule to decompose the joint trajectory distribution:
At each step, the model only needs to predict a local trajectory segment conditioned on the current state. Each local prediction faces a simpler distribution (often nearly unimodal at the step level), and the global multimodality emerges from the sequential composition of these choices.
This converts a geometric problem (find a trajectory in a non-convex set) into a sequential decision problem (at each step, choose the most likely next segment), which is precisely the regime where autoregressive models excel.
6. The Convergence: AR + Diffusion
The most promising direction combines both paradigms, leveraging their complementary strengths:
| AR | Diffusion | |
|---|---|---|
| Strength | Accurate single-step prediction; diversity via token vocabulary | Global trajectory coherence; smooth “error correction” over long horizons |
| Weakness | Exposure bias and compounding error over long rollouts | Cold-start problem: enormous search space from pure noise |
| Role in combination | Provides anchor trajectory near the data manifold | Refines anchor into globally coherent, smooth trajectory |
The synergy is clear:
- AR solves Diffusion’s cold-start: Instead of starting from Gaussian noise, diffusion begins from the AR-generated anchor — already near the manifold — vastly reducing the denoising burden.
- Diffusion solves AR’s drift: The global refinement step corrects compounding errors that accumulate in long autoregressive rollouts.
This AR + Diffusion combination achieved top-ranking results on the NavSim benchmark (Chainflow-VLA, 94.05 PDMS) and has been validated in works like DiffusionDrive (anchor-based truncated diffusion) and GoalFlow (goal-point guided flow matching).
7. Summary
| Approach | Non-convex handling | Multimodality | Limitation |
|---|---|---|---|
| Regression (MSE) | None — outputs conditional mean | Fails: averages modes into infeasible region | Structural failure on non-convex sets |
| GMM | Partial — finite convex approximation | Limited by ; spurious peaks | Patch, not solution |
| MSE + Penalty Loss | Indirect via soft constraints | Same MSE mean, just pushed toward boundary | Only valid for convex sub-problems |
| Diffusion | Direct — learns the full distribution shape | Natural: samples from learned modes | Cold-start; may lack diversity without anchors |
| Autoregressive | Decomposition via chain rule | Natural: sequential choices compose to multimodality | Compounding error; frame inconsistency |
| AR + Diffusion | Both: decomposition + global refinement | Best of both: diverse anchors + coherent output | Engineering complexity; training cost |
The progression from regression to GMM to generative models is not a matter of incremental improvement. It reflects a fundamental recognition: the planning problem in autonomous driving is inherently non-convex, and any approach that ignores this geometric fact will produce artifacts that no amount of engineering patching can fix.
References
- DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving (CVPR 2025)
- GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectory Generation
- MotionLM: Multi-Agent Motion Forecasting as Language Modeling (Waymo, ICRA 2024)
- AlphaDrive: GRPO-based RL with Planning Reasoning for Autonomous Driving
- NavSim Benchmark
- Diffusion-Planner: Transformer-based Diffusion for Closed-Loop Planning