Why Generative Planning? The Non-Convexity Argument Against Regression in Autonomous Driving

The trajectory planner is the decision-making core of an autonomous driving system. Its task: given the current scene, output a future trajectory that is safe, comfortable, and efficient. Most production systems today use some form of regression — minimizing the distance between predicted and ground-truth trajectories. Yet a growing body of research and engineering evidence suggests this approach has a fundamental flaw: it assumes the feasible set is convex when it is emphatically not. This article lays out the first-principles argument for why generative approaches (diffusion, autoregressive) are not merely improvements but necessary paradigm shifts.

1. The Non-Convexity of the Feasible Set

A set $S$ is convex if for any two points $A, B \in S$ , every point on the line segment connecting them also belongs to $S$ . In driving, this property fails dramatically:

Trajectory A goes left around the obstacle; trajectory B goes right. Both are valid. Their average $\frac{A+B}{2}$ drives straight into the obstacle — infeasible. The feasible set is not convex, and no amount of regularization changes this geometric fact.

2. Why Regression Fails: MSE Averages Modes

Regression with MSE loss minimizes:

\min \; \mathbb{E}\left[\| y_{\text{pred}} - y_{\text{gt}} \|^2\right]

When the data distribution is multimodal (e.g., left detour and right detour are both common), the optimal MSE predictor outputs the conditional mean:

y^* = \mathbb{E}[y_{\text{gt}} \mid x] = \frac{A + B}{2}

This is not a bug in training — it is the mathematically correct solution to the wrong objective. The regression objective assumes a unimodal distribution centered on the mean, which is provably incorrect for non-convex feasible sets.

The MSE mean lands in the valley between two modes — a region of low probability density. The model outputs a trajectory that no human driver would ever take.

3. GMM: A Patch, Not a Solution

Gaussian Mixture Models (GMM) with $K$ components attempt to address multimodality by learning $K$ means. Each component’s update $\mu_i$ is still the weighted average of samples assigned to that component:

\mu_i = \frac{\sum_n \gamma_{n,i} \cdot y_n}{\sum_n \gamma_{n,i}}

This creates two problems:

Spurious peaks: When two true modes are close, their Gaussian components can overlap and produce a false peak in the valley between them.
Finite approximation: $K$ Gaussians are finite convex building blocks. A non-convex shape can never be perfectly tiled by convex pieces. There will always be “gaps” (non-zero probability where there should be none) and “dead corners” (insufficient $K$ to cover all modes).

GMM is a patch, not a solution. It uses a finite number of simple convex building blocks to approximate a complex non-convex shape. The approximation error is structural, not parametric — it cannot be fixed by increasing training data or tuning hyperparameters.

4. The Penalty Loss Illusion

A common engineering practice is to add penalty terms (collision, off-road, comfort) on top of MSE loss:

\mathcal{L} = \mathcal{L}_{\text{MSE}} + \lambda_1 \mathcal{L}_{\text{collision}} + \lambda_2 \mathcal{L}_{\text{off-road}} + \lambda_3 \mathcal{L}_{\text{comfort}} + \cdots

This is equivalent to converting hard constraints into soft penalties via Lagrange multipliers. The approach is valid only when the optimization problem is convex. On a non-convex landscape, gradient descent from the MSE initialization can get trapped in local minima, and the penalty terms merely push the solution toward the nearest feasible boundary rather than the globally optimal trajectory.

The classical EM (Expectation Maximization) planner understood this well. It decomposed the problem into two stages:

Step A (Path Decider): Choose a corridor (e.g., “go left”), cutting the non-convex space into a convex sub-region.
Step B (Speed Optimizer): Solve a Quadratic Program (QP) within this convex sub-region to obtain a smooth trajectory.

The key insight: first find a convex sub-problem, then solve it. End-to-end regression skips Step A entirely, attempting to solve the non-convex problem in one shot.

5. Generative Models: Learning the Non-Convex Shape

Generative approaches take a fundamentally different path:

Method	How it handles non-convexity
Diffusion	Directly learns the shape of the non-convex distribution via gradient/flow field
Autoregressive	Decomposes the joint distribution via chain rule into conditional distributions; converts a geometric problem into a sequential decision problem

5.1 Diffusion: Learning the Contour

A diffusion model learns the score function $\nabla_y \log p(y \mid x)$ , which points toward higher-density regions at every point in trajectory space. During sampling, it follows this gradient field from noise to data, naturally navigating around infeasible regions:

The score field naturally pushes samples away from infeasible regions (zero density) and toward high-density modes.

5.2 Autoregressive: Sequential Decision Decomposition

The autoregressive approach applies the chain rule to decompose the joint trajectory distribution:

p(S_{1:T} \mid \text{Env}) = \prod_{t=1}^{T} p(S_{t:t+n} \mid s_{<t}, \text{Env})

At each step, the model only needs to predict a local trajectory segment conditioned on the current state. Each local prediction faces a simpler distribution (often nearly unimodal at the step level), and the global multimodality emerges from the sequential composition of these choices.

This converts a geometric problem (find a trajectory in a non-convex set) into a sequential decision problem (at each step, choose the most likely next segment), which is precisely the regime where autoregressive models excel.

6. The Convergence: AR + Diffusion

The most promising direction combines both paradigms, leveraging their complementary strengths:

	AR	Diffusion
Strength	Accurate single-step prediction; diversity via token vocabulary	Global trajectory coherence; smooth “error correction” over long horizons
Weakness	Exposure bias and compounding error over long rollouts	Cold-start problem: enormous search space from pure noise
Role in combination	Provides anchor trajectory near the data manifold	Refines anchor into globally coherent, smooth trajectory

The synergy is clear:

AR solves Diffusion’s cold-start: Instead of starting from Gaussian noise, diffusion begins from the AR-generated anchor — already near the manifold — vastly reducing the denoising burden.
Diffusion solves AR’s drift: The global refinement step corrects compounding errors that accumulate in long autoregressive rollouts.

This AR + Diffusion combination achieved top-ranking results on the NavSim benchmark (Chainflow-VLA, 94.05 PDMS) and has been validated in works like DiffusionDrive (anchor-based truncated diffusion) and GoalFlow (goal-point guided flow matching).

7. Summary

Approach	Non-convex handling	Multimodality	Limitation
Regression (MSE)	None — outputs conditional mean	Fails: averages modes into infeasible region	Structural failure on non-convex sets
GMM	Partial — finite convex approximation	Limited by $K$ ; spurious peaks	Patch, not solution
MSE + Penalty Loss	Indirect via soft constraints	Same MSE mean, just pushed toward boundary	Only valid for convex sub-problems
Diffusion	Direct — learns the full distribution shape	Natural: samples from learned modes	Cold-start; may lack diversity without anchors
Autoregressive	Decomposition via chain rule	Natural: sequential choices compose to multimodality	Compounding error; frame inconsistency
AR + Diffusion	Both: decomposition + global refinement	Best of both: diverse anchors + coherent output	Engineering complexity; training cost

The progression from regression to GMM to generative models is not a matter of incremental improvement. It reflects a fundamental recognition: the planning problem in autonomous driving is inherently non-convex, and any approach that ignores this geometric fact will produce artifacts that no amount of engineering patching can fix.

1. The Non-Convexity of the Feasible Set#

2. Why Regression Fails: MSE Averages Modes#

3. GMM: A Patch, Not a Solution#

4. The Penalty Loss Illusion#

5. Generative Models: Learning the Non-Convex Shape#

5.1 Diffusion: Learning the Contour#

5.2 Autoregressive: Sequential Decision Decomposition#

6. The Convergence: AR + Diffusion#

7. Summary#

References#