AI on Xu'Blog

AI on Xu'Bloghttps://xuquant.com/categories/ai/Recent content in AI on Xu'BlogXu'Bloghttps://xuquant.com/images/profile.jpghttps://xuquant.com/images/profile.jpgHugo -- 0.152.2enFri, 08 May 2026 18:00:00 +0800ReflectDrive-2：理想汽车的离散扩散端到端驾驶与 RL 联合优化https://xuquant.com/posts/reflectdrive-2-discrete-diffusion-end-to-end-driving/Fri, 08 May 2026 18:00:00 +0800https://xuquant.com/posts/reflectdrive-2-discrete-diffusion-end-to-end-driving/深度解读理想汽车 ReflectDrive-2：首创离散扩散模型用于端到端自动驾驶规划，提出「决策-起草-反思」三阶段推理范式，通过强化学习联合优化起草+编辑实现 AutoEdit 增益放大 6 倍，纯相机输入达 NAVSIM SOTA 91.0 PDMS，Thor 芯片上 31.8ms/帧实时部署。何恺明团队 CVPR 2026 五篇论文全景：流匹配范式的多角度突破https://xuquant.com/posts/kaiming-he-cvpr2026-five-papers-flow-matching-breakthrough/Fri, 08 May 2026 18:00:00 +0800https://xuquant.com/posts/kaiming-he-cvpr2026-five-papers-flow-matching-breakthrough/深度解读何恺明团队 CVPR 2026 五篇论文：从 JiT 直接预测干净图像、VARC 纯视觉推理超越人类、BiFlow 归一化流 700 倍加速、iMF 无蒸馏单步生成 FID 1.72，到 Pixo 20 亿像素监督正面挑战 DINOv3。一场关于「生成范式」的系统性重构。DeepSeek 以视觉原语思考：让多模态大模型学会「用手指着推理」https://xuquant.com/posts/deepseek-thinking-with-visual-primitives/Thu, 30 Apr 2026 20:00:00 +0800https://xuquant.com/posts/deepseek-thinking-with-visual-primitives/深度解读 DeepSeek 联合北大/清华提出的「以视觉原语思考」新范式：用坐标和边界框替代自然语言描述，解决多模态模型推理过程中的指代歧义问题，迷宫导航任务领先 GPT-5.4 达 17 个百分点。SceneVerse++: Lifting Unlabeled Internet Videos into 3D Scene Understanding Training Datahttps://xuquant.com/posts/sceneverse-plus-data-engine-for-3d-scene-understanding/Thu, 30 Apr 2026 18:00:00 +0800https://xuquant.com/posts/sceneverse-plus-data-engine-for-3d-scene-understanding/Deep analysis of CVPR 2026 SceneVerse++: how to build the largest-scale real-world 3D scene dataset from unlabeled internet videos, covering detection, segmentation, spatial VQA, and vision-language navigation.Qwen3.5 vs Qwen3: A Deep Architectural Comparisonhttps://xuquant.com/posts/autodrive/qwen3-vs-qwen3-5-architecture/Wed, 29 Apr 2026 14:00:00 +0800https://xuquant.com/posts/autodrive/qwen3-vs-qwen3-5-architecture/A deep architectural comparison of Qwen3.5 versus Qwen3, examining hybrid attention, native multimodal fusion, high-sparsity MoE, and partial RoPE across attention, vision, and MoE dimensionsCORAL: Autonomous Multi-Agent Evolution for Open-Ended Discoveryhttps://xuquant.com/posts/ai/coral-autonomous-multi-agent-evolution/Thu, 15 May 2025 10:00:00 +0800https://xuquant.com/posts/ai/coral-autonomous-multi-agent-evolution/How delegating evolutionary search decisions to autonomous agents—rather than relying on fixed heuristics—enables faster convergence and stronger results across mathematical and systems optimization tasks.InSpatio-World: Real-Time 4D World Simulation via Spatiotemporal Autoregressive Modelinghttps://xuquant.com/posts/autodrive/inspatio-world-4d-simulator/Sun, 20 Apr 2025 10:00:00 +0800https://xuquant.com/posts/autodrive/inspatio-world-4d-simulator/A deep technical analysis of InSpatio-World: a 1.3B-parameter real-time 4D world simulator that combines implicit spatiotemporal caching with explicit geometric constraints, achieving 24 FPS novel-view synthesis from monocular video.Multi-Head Latent Attention: Efficient KV Cache Compression in DeepSeek-V2https://xuquant.com/posts/autodrive/deepseek_series1_mla/Sat, 15 Feb 2025 10:00:00 +0800https://xuquant.com/posts/autodrive/deepseek_series1_mla/Deep technical analysis of Multi-Head Latent Attention (MLA) from DeepSeek-V2, covering low-rank KV cache compression, decoupled RoPE design, and computational cost comparison with MHA, MQA, and GQA.