Autonomous Driving on Xu'Blog

Autonomous Driving on Xu'Bloghttps://xuquant.com/categories/autonomous-driving/Recent content in Autonomous Driving on Xu'BlogXu'Bloghttps://xuquant.com/images/profile.jpghttps://xuquant.com/images/profile.jpgHugo -- 0.152.2enFri, 08 May 2026 18:00:00 +0800ReflectDrive-2：理想汽车的离散扩散端到端驾驶与 RL 联合优化https://xuquant.com/posts/reflectdrive-2-discrete-diffusion-end-to-end-driving/Fri, 08 May 2026 18:00:00 +0800https://xuquant.com/posts/reflectdrive-2-discrete-diffusion-end-to-end-driving/深度解读理想汽车 ReflectDrive-2：首创离散扩散模型用于端到端自动驾驶规划，提出「决策-起草-反思」三阶段推理范式，通过强化学习联合优化起草+编辑实现 AutoEdit 增益放大 6 倍，纯相机输入达 NAVSIM SOTA 91.0 PDMS，Thor 芯片上 31.8ms/帧实时部署。X-Cache：小鹏自动驾驶世界模型的推理加速 Infrahttps://xuquant.com/posts/xpeng-x-cache-world-model-inference-acceleration/Thu, 07 May 2026 18:00:00 +0800https://xuquant.com/posts/xpeng-x-cache-world-model-inference-acceleration/深度解读小鹏 X-Cache：通过跨段残差缓存实现世界模型 2.7 倍推理加速，71% DiT block 跳过率且几乎零画质损失，training-free 的自动驾驶推理优化方案。Reinforcement Learning for End-to-End Autonomous Driving: From Offline DPO to Iterative Self-Improvementhttps://xuquant.com/posts/autodrive/basic_rl/Tue, 20 Jan 2026 10:00:00 +0800https://xuquant.com/posts/autodrive/basic_rl/Comprehensive analysis of applying reinforcement learning to end-to-end autonomous driving, covering metric caching, Direct Preference Optimization (DPO) across action representations, and strategies for breaking sampling ceilings in iterative self-improvement.Vision-Language-Action Models for Autonomous Driving: The Cosmos-Reason Approachhttps://xuquant.com/posts/autodrive/nvidia_vla/Sun, 11 Jan 2026 10:00:00 +0800https://xuquant.com/posts/autodrive/nvidia_vla/Technical deep-dive into Nvidia's Cosmos-Reason (Alpamayo) VLA system for autonomous driving, covering tri-plane vision encoding, ego-shortcut avoidance, Cause-of-Change dataset paradigm, and reasoning-action alignment via reinforcement learning.End-to-End Autonomous Driving: From Modular Decoders to VLA Architectureshttps://xuquant.com/posts/autodrive/e2e-autonomous-driving-evolution/Thu, 01 May 2025 10:00:00 +0800https://xuquant.com/posts/autodrive/e2e-autonomous-driving-evolution/A technical survey on the architectural evolution of end-to-end autonomous driving, covering planner decoder selection (AR vs Diffusion vs Flow Matching), VLA integration strategies, and engineering best practices for data infrastructure, training optimization, and evaluation systems.Policy Optimization for End-to-End Autonomous Driving: From REINFORCE to GRPOhttps://xuquant.com/posts/autodrive/rl-policy-optimization-e2e-driving/Wed, 30 Apr 2025 10:00:00 +0800https://xuquant.com/posts/autodrive/rl-policy-optimization-e2e-driving/A systematic derivation of policy optimization methods for end-to-end autonomous driving: from REINFORCE through PPO to GRPO, covering advantage estimation, sampling differences between LLM and driving, multi-objective loss design, and the role of noise in diffusion-based exploration.Trajectory Tokenization for Autoregressive Planning: Clustering, Matching, and the AR+Diffusion Paradigmhttps://xuquant.com/posts/autodrive/ar-trajectory-tokenization/Tue, 01 Apr 2025 10:00:00 +0800https://xuquant.com/posts/autodrive/ar-trajectory-tokenization/A deep dive into trajectory tokenization for autoregressive driving planners: from state-based discretization via k-means clustering, through token matching and reconstruction, to the AR+Diffusion paradigm and GRPO-based reinforcement learning post-training.Why Generative Planning? The Non-Convexity Argument Against Regression in Autonomous Drivinghttps://xuquant.com/posts/autodrive/generative-planning-nonconvex/Sat, 15 Mar 2025 10:00:00 +0800https://xuquant.com/posts/autodrive/generative-planning-nonconvex/A first-principles analysis of why regression-based planners fail in autonomous driving: the feasible set is non-convex, MSE averages into obstacles, GMM is a patch not a solution, and generative approaches are necessary.