<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Autonomous Driving: End-to-End, VLA, and Beyond on Xu'Blog</title><link>https://xuquant.com/posts/autonomous-driving/</link><description>Recent content in Autonomous Driving: End-to-End, VLA, and Beyond on Xu'Blog</description><image><title>Xu'Blog</title><url>https://xuquant.com/og-default.png</url><link>https://xuquant.com/og-default.png</link></image><generator>Hugo -- 0.152.2</generator><language>zh</language><lastBuildDate>Tue, 26 May 2026 10:00:00 +0800</lastBuildDate><atom:link href="https://xuquant.com/posts/autonomous-driving/index.xml" rel="self" type="application/rss+xml"/><item><title>量产 VLA 的 8 个工程判断 + 4 个反例</title><link>https://xuquant.com/posts/autonomous-driving/production-vla-engineering-tradeoffs/</link><pubDate>Tue, 26 May 2026 10:00:00 +0800</pubDate><guid>https://xuquant.com/posts/autonomous-driving/production-vla-engineering-tradeoffs/</guid><description>量产 VLA 在 VLM 训练、轨迹 head 选型、cross-attention 信号、不定长 stream membank、同构特征蒸馏、SFT-AFT-RL 三段配比、单板部署等 8 个具体选择上的取舍逻辑；以及 4 个&amp;#39;试过没用&amp;#39;的反例，标定了搜索空间的边界。</description></item><item><title>Affordance vs Symbolic Perception in AD：二分 framing 错在哪</title><link>https://xuquant.com/posts/autonomous-driving/affordance-vs-symbolic-perception/</link><pubDate>Sun, 24 May 2026 11:00:00 +0800</pubDate><guid>https://xuquant.com/posts/autonomous-driving/affordance-vs-symbolic-perception/</guid><description>AD 圈把 affordance / symbolic 当二分讨论，但 symbolic 一词同时指结构化感知输出和 language 输出，benchmark 排序不一致，Wayve / Tesla / 蔚小理实际站位都是 hybrid——这条 spectrum 是 framing 错位。真正决定 production VLA 的是几条独立工程 axis。</description></item><item><title>Dense Latent Predictive Supervision in AD VLA：为什么 pixel 不是最优</title><link>https://xuquant.com/posts/autonomous-driving/dense-latent-predictive-supervision/</link><pubDate>Sun, 24 May 2026 10:00:00 +0800</pubDate><guid>https://xuquant.com/posts/autonomous-driving/dense-latent-predictive-supervision/</guid><description>AD VLA 用 sparse trajectory loss（12 个 waypoint × 2D = 24 scalars）监督 2B+ 参数 backbone，信息论 ratio ~10⁻¹⁰——supervision deficit 是 NAVSIM 87-93 区间停滞的核心原因。DriveVLA-W0 用 pixel-level future image prediction 补，方向对但路线非最优。V-JEPA 风格 latent predictive supervision 在 capacity / 推理 cost / 评测同构性三条上都更友好。</description></item><item><title>自动驾驶 VLA 的 3D 视觉表征：从能力边界到工程注入</title><link>https://xuquant.com/posts/autonomous-driving/3d-vision-injection-for-ad-vla/</link><pubDate>Fri, 22 May 2026 10:00:00 +0800</pubDate><guid>https://xuquant.com/posts/autonomous-driving/3d-vision-injection-for-ad-vla/</guid><description>自动驾驶 VLA 系统部署时，vision tower 端 3D 注入的工程化决策——从 driving 真实需要的几何能力出发，经 latent space 拓扑分析、几何 prior 三种来源、五种注入路径，到车端推理预算这条硬约束，给出一套可操作的判别原则。</description></item><item><title>4D Vision Encoder for Autonomous Driving：信息瓶颈视角下的统一审视</title><link>https://xuquant.com/posts/autonomous-driving/4d-vision-encoder-for-autonomous-driving/</link><pubDate>Sun, 17 May 2026 10:00:00 +0800</pubDate><guid>https://xuquant.com/posts/autonomous-driving/4d-vision-encoder-for-autonomous-driving/</guid><description>把 AR1 Tri-plane、Flex、MEM、Memory VLA、BEV/OCC、V-JEPA、DA3、VGGT 等 9 种 4D 视觉编码方案放进同一个信息瓶颈坐标系，从 Y 的四元结构（感知/预测/规划/推理）推出理想 4D encoder 的五个必要条件，给出 Qwen3.5 上 4V→7V 升级的评估路径。</description></item><item><title>VLA 语义下的导航信息注入：从 Prompt 到 Diffusion Condition</title><link>https://xuquant.com/posts/autonomous-driving/diffusion-planner-navigation-injection/</link><pubDate>Thu, 14 May 2026 10:00:00 +0800</pubDate><guid>https://xuquant.com/posts/autonomous-driving/diffusion-planner-navigation-injection/</guid><description>业界商用导航 SDK 普遍输出 maneuver 链、辅助动作语义、车道级引导、路径航点等丰富信号，但公开数据集 nuPlan/nuScenes/NAVSIM 在采集环节就没有接入这些字段——消费完整 navi 的 VLA 研究当前只能在私有集上做。本文以「VLA 如何高效消费业界已提供的导航信息」为线索，逐层剖析 Prompt 编码、Adapter 对齐、Diffusion 条件、统一空间 Token 四层注入机制，并讨论 VLN 持续交互范式，涵盖 SpaceDrive、SSR、DiffusionPlanner、GoalFlow、ONR/MAT、SGDrive 等 2024-2026 最新工作。</description></item><item><title>VLM 时序记忆机制：从视频压缩到长短时记忆融合</title><link>https://xuquant.com/posts/autonomous-driving/vlm-temporal-memory-mechanisms/</link><pubDate>Sat, 09 May 2026 06:00:00 +0800</pubDate><guid>https://xuquant.com/posts/autonomous-driving/vlm-temporal-memory-mechanisms/</guid><description>系统梳理 VLM 中时序建模的主流方案：Nvidia Flex 编码器、LlamaFactory 视频处理管线、Qwen 时空压缩、Pi 0.7 MEM 时空可分离注意力与 Memory VLA，并基于 Qwen3-VL 工程实现详解 MEM 的零参数改造方案。</description></item><item><title>ReflectDrive-2：理想汽车的离散扩散端到端驾驶与 RL 联合优化</title><link>https://xuquant.com/posts/autonomous-driving/reflectdrive-2-discrete-diffusion-end-to-end-driving/</link><pubDate>Sat, 25 Apr 2026 18:00:00 +0800</pubDate><guid>https://xuquant.com/posts/autonomous-driving/reflectdrive-2-discrete-diffusion-end-to-end-driving/</guid><description>深度解读理想汽车 ReflectDrive-2：离散扩散用于端到端规划，「决策-起草-反思」三阶段配 AutoEdit 局部修正，RL 联合优化把 AutoEdit 增益放大 6 倍，纯相机输入 91.0 PDMS（NAVSIM v1 navtest），Thor 上 31.8ms/帧。</description></item><item><title>扩散模型与自动驾驶规划：从去噪的数学到轨迹的生成</title><link>https://xuquant.com/posts/autonomous-driving/diffusion-for-driving/</link><pubDate>Sat, 08 Nov 2025 10:00:00 +0800</pubDate><guid>https://xuquant.com/posts/autonomous-driving/diffusion-for-driving/</guid><description>面向自动驾驶的扩散模型原理深度梳理：从 DDPM 的变分推断到 Flow Matching 的直线耦合，从 Classifier-Free Guidance 的条件控制到 Truncated Diffusion 的截断加速——理解每一步&amp;#39;为什么&amp;#39;而非仅仅是&amp;#39;怎么做&amp;#39;。</description></item><item><title>Reinforcement Learning for End-to-End Autonomous Driving: From Offline DPO to Iterative Self-Improvement</title><link>https://xuquant.com/posts/autonomous-driving/basic_rl/</link><pubDate>Sat, 20 Sep 2025 10:00:00 +0800</pubDate><guid>https://xuquant.com/posts/autonomous-driving/basic_rl/</guid><description>全面分析将强化学习应用于端到端自动驾驶系统，涵盖 metric caching 机制、不同动作表示下的 DPO，以及突破迭代自改进流水线采样上限的策略。</description></item><item><title>Alpamayo：面向自动驾驶的推理-动作对齐 VLA 系统</title><link>https://xuquant.com/posts/autonomous-driving/nvidia_vla/</link><pubDate>Sat, 30 Aug 2025 10:00:00 +0800</pubDate><guid>https://xuquant.com/posts/autonomous-driving/nvidia_vla/</guid><description>深入技术解析 Nvidia Alpamayo VLA 自动驾驶系统，以 Cosmos-Reason 为 VLM 主干，涵盖三平面视觉编码、自车捷径规避、变化因数据集范式，以及通过强化学习实现的推理-动作对齐。</description></item><item><title>Policy Optimization for End-to-End Autonomous Driving: From REINFORCE to GRPO</title><link>https://xuquant.com/posts/autonomous-driving/rl-policy-optimization-e2e-driving/</link><pubDate>Sat, 09 Aug 2025 10:00:00 +0800</pubDate><guid>https://xuquant.com/posts/autonomous-driving/rl-policy-optimization-e2e-driving/</guid><description>端到端自动驾驶策略优化方法的系统推导：从 REINFORCE 到 PPO 再到 GRPO，涵盖优势估计、LLM 与驾驶采样的差异、多目标损失设计，以及扩散模型探索中噪声的作用。</description></item><item><title>End-to-End Autonomous Driving: From Modular Decoders to VLA Architectures</title><link>https://xuquant.com/posts/autonomous-driving/e2e-autonomous-driving-evolution/</link><pubDate>Sat, 19 Jul 2025 10:00:00 +0800</pubDate><guid>https://xuquant.com/posts/autonomous-driving/e2e-autonomous-driving-evolution/</guid><description>端到端自动驾驶架构演化的技术综述，涵盖规划器解码器选择（AR vs Diffusion vs Flow Matching）、VLA 集成策略，以及数据基础设施、训练优化和评估系统的工程最佳实践。</description></item><item><title>Trajectory Tokenization for Autoregressive Planning: Clustering, Matching, and the AR+Diffusion Paradigm</title><link>https://xuquant.com/posts/autonomous-driving/ar-trajectory-tokenization/</link><pubDate>Sat, 28 Jun 2025 10:00:00 +0800</pubDate><guid>https://xuquant.com/posts/autonomous-driving/ar-trajectory-tokenization/</guid><description>深入探讨自回归驾驶规划器的轨迹分词方法：从基于 k-means 聚类的状态离散化，到 token 匹配与重建，再到 AR+Diffusion 范式与基于 GRPO 的强化学习后训练。</description></item><item><title>Why Generative Planning? The Non-Convexity Argument Against Regression in Autonomous Driving</title><link>https://xuquant.com/posts/autonomous-driving/generative-planning-nonconvex/</link><pubDate>Sat, 07 Jun 2025 10:00:00 +0800</pubDate><guid>https://xuquant.com/posts/autonomous-driving/generative-planning-nonconvex/</guid><description>从第一性原理分析回归式规划器在自动驾驶中失败的原因：可行域是非凸的，MSE 将模式平均到障碍物上，GMM 是补丁而非解决方案，生成式方法是必要的。</description></item></channel></rss>