Welcome to Xu’Blog

Hello, I’m LexHsu. Pain is inevitable, but suffering is optional. I’ll be documenting my learning journey on this blog with immense patience, starting in 2025.

Qwen3.5 vs Qwen3: A Deep Architectural Comparison

中文版本: 阅读中文版 Figure from Qwen3.5-Omni Technical Report Based on Qwen3.5 official technical documentation and code structure analysis. 交互式架构对比下面是 Qwen3-VL 与 Qwen3.5 的交互式架构可视化，支持 Tab 切换、拖拽平移、滚轮缩放，点击节点查看详细信息。操作提示：点击顶部 Tab 切换 Qwen3-VL / Qwen3.5 / Compare 视图；滚轮缩放；拖拽平移；点击节点查看参数详情。 ...

CORAL: Autonomous Multi-Agent Evolution for Open-Ended Discovery

Introduction Figure from CORAL: Autonomous Multi-Agent Evolution for Open-Ended Discovery Open-ended discovery—the search for novel, high-quality solutions in domains where the solution space lacks clear structure and evaluation may be expensive or sparse—remains one of the hardest challenges in automated scientific reasoning. Unlike constrained optimization, where gradients or convexity guide the search, open-ended problems demand sustained exploration, accumulation of partial insights, and the ability to redirect effort when progress stalls. Mathematical conjecture proving, systems-level code optimization, and combinatorial design all fall squarely in this category. ...

InSpatio-World: Real-Time 4D World Simulation via Spatiotemporal Autoregressive Modeling

Figure from InSpatio-World: Real-Time 4D World Simulation via Spatiotemporal Autoregressive Modeling The ability to simulate a 4D world — one that evolves in time and can be viewed from arbitrary perspectives — is a foundational capability for autonomous driving, robotics, and embodied AI. Existing video generation models produce visually compelling sequences but lack spatial consistency when the camera moves. 3D reconstruction methods achieve geometric fidelity but struggle with dynamic scenes and real-time performance. InSpatio-World bridges this gap through a spatiotemporal autoregressive (STAR) architecture that combines the strengths of both paradigms. ...

Reinforcement Learning for End-to-End Autonomous Driving: From Offline DPO to Iterative Self-Improvement

中文版本：阅读中文版 Introduction The integration of reinforcement learning into end-to-end autonomous driving systems has emerged as a promising direction for improving trajectory planning beyond what supervised learning alone can achieve. However, the direct application of standard RL algorithms to driving tasks faces core challenges: the sim-to-real gap in log-replay environments, the computational bottleneck of online simulation, and the difficulty of defining dense reward signals for continuous trajectory generation. ...

Multi-Head Latent Attention: DeepSeek V2/V3 Engineering View

中文版本: 阅读中文版 This article focuses on engineering perspective. The mathematical derivation of MLA (from RoPE to latent projection, partial RoPE compatibility proof, weight absorption algebra) is in /posts/mathematics/position-encoding/mla-from-rope/. This article does not repeat the math — it discusses only the engineering numbers and design trade-offs that matter for DeepSeek V2/V3 deployment. Figure from DeepSeek-V2: A Strong, Economical, and Efficient MoE Language Model 1. Why DeepSeek Chose MLA: Engineering Motivation DeepSeek V2 / V3 [1] adopt MLA as the replacement for standard MHA — the root motivation is deployment economics. In LLM serving, the KV cache memory footprint directly determines how many concurrent requests fit on one card. For DeepSeek V2’s size (nh=128n_h = 128, dh=128d_h = 128, l=60l = 60), standard MHA caches 2nhdh=32,7682 n_h d_h = 32{,}768 elements per token per layer; across 60 layers, ~2M elements/token, ~4 MB/token in bf16. Under 32 K context, a single sequence consumes ~128 GB — one H100 80 GB cannot even fit it. ...

Alpamayo: Reasoning-Action Aligned VLA for Autonomous Driving

中文版本：阅读中文版 Introduction Figure from Alpamayo-R1: Bridging Reasoning and Action Prediction for Autonomous Driving End-to-end autonomous driving has made significant progress in recent years, yet deploying Vision-Language-Action (VLA) models in real-world driving scenarios remains challenging. The basic difficulties are fourfold. First, multi-frame temporal understanding requires the model to extract decision-relevant changes from highly redundant consecutive observations, rather than merely processing static snapshots. Second, driving decisions must be causal: the model must model why a particular action is taken, not just learn statistical correlations between situations and actions. Third, predicted trajectories must satisfy kinematic and dynamic constraints while remaining multi-modal and efficient enough for real-time inference. Fourth, the reasoning process must be tightly aligned with action output—reasoning should not be a post-hoc rationalization but must be verifiable by and constrained by the actual actions taken. ...

Policy Optimization for End-to-End Autonomous Driving: From REINFORCE to GRPO

中文版本：阅读中文版 1. Why End-to-End Driving Needs Reinforcement Learning Figure from AlphaDrive: GRPO-based RL for Autonomous Driving Supervised learning—whether through imitation learning or behavior cloning—can only take an autonomous driving system so far. The core limitation is distributional: the training data is drawn from expert demonstrations, and any distributional shift between training and deployment leads to compounding errors. More critically, supervised objectives are misaligned with the true goal of driving. Minimizing the L2 distance to a ground-truth trajectory penalizes safe deviations as harshly as dangerous ones, and provides no mechanism for the model to discover better trajectories than those in the dataset. ...

End-to-End Autonomous Driving: From Modular Decoders to VLA Architectures

中文版本：阅读中文版 Introduction The trajectory of autonomous driving architecture has undergone a paradigm shift: from the classical modular pipeline (perception →\to prediction →\to planning →\to control) toward end-to-end systems that map sensory inputs directly to driving actions. This transition is not merely an engineering convenience—it reflects a deep recognition that modular interfaces impose information bottlenecks and that joint optimization across the full stack can yield emergent capabilities invisible to individually optimized modules. ...

Trajectory Tokenization for Autoregressive Planning: Clustering, Matching, and the AR+Diffusion Paradigm

中文版本：阅读中文版 Figure from DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving Autoregressive (AR) trajectory generation — predicting driving trajectories as sequences of discrete tokens, much like language models predict text — has emerged as a powerful paradigm for end-to-end autonomous driving. But how do we turn continuous trajectories into discrete tokens? How do we ensure the tokenized representation preserves enough fidelity for planning? And how does the AR paradigm combine with diffusion and reinforcement learning to produce state-of-the-art results? This article walks through the complete pipeline, from tokenization theory to RL post-training. ...

Why Generative Planning? The Non-Convexity Argument Against Regression in Autonomous Driving

中文版本：阅读中文版 Figure from DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving The trajectory planner is the decision-making core of an autonomous driving system. Its task: given the current scene, output a future trajectory that is safe, comfortable, and efficient. Most production systems today use some form of regression — minimizing the distance between predicted and ground-truth trajectories. Yet a growing body of research and engineering evidence suggests this approach has a basic flaw: it assumes the feasible set is convex when it is emphatically not. This article lays out the first-principles argument for why generative approaches (diffusion, autoregressive) are necessary paradigm shifts, not merely improvements. ...