<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>AI on Xu'Blog</title><link>https://xuquant.com/categories/ai/</link><description>Recent content in AI on Xu'Blog</description><image><title>Xu'Blog</title><url>https://xuquant.com/images/profile.jpg</url><link>https://xuquant.com/images/profile.jpg</link></image><generator>Hugo -- 0.152.2</generator><language>en</language><lastBuildDate>Fri, 08 May 2026 18:00:00 +0800</lastBuildDate><atom:link href="https://xuquant.com/categories/ai/index.xml" rel="self" type="application/rss+xml"/><item><title>ReflectDrive-2：理想汽车的离散扩散端到端驾驶与 RL 联合优化</title><link>https://xuquant.com/posts/reflectdrive-2-discrete-diffusion-end-to-end-driving/</link><pubDate>Fri, 08 May 2026 18:00:00 +0800</pubDate><guid>https://xuquant.com/posts/reflectdrive-2-discrete-diffusion-end-to-end-driving/</guid><description>深度解读理想汽车 ReflectDrive-2：首创离散扩散模型用于端到端自动驾驶规划，提出「决策-起草-反思」三阶段推理范式，通过强化学习联合优化起草+编辑实现 AutoEdit 增益放大 6 倍，纯相机输入达 NAVSIM SOTA 91.0 PDMS，Thor 芯片上 31.8ms/帧实时部署。</description></item><item><title>何恺明团队 CVPR 2026 五篇论文全景：流匹配范式的多角度突破</title><link>https://xuquant.com/posts/kaiming-he-cvpr2026-five-papers-flow-matching-breakthrough/</link><pubDate>Fri, 08 May 2026 18:00:00 +0800</pubDate><guid>https://xuquant.com/posts/kaiming-he-cvpr2026-five-papers-flow-matching-breakthrough/</guid><description>深度解读何恺明团队 CVPR 2026 五篇论文：从 JiT 直接预测干净图像、VARC 纯视觉推理超越人类、BiFlow 归一化流 700 倍加速、iMF 无蒸馏单步生成 FID 1.72，到 Pixo 20 亿像素监督正面挑战 DINOv3。一场关于「生成范式」的系统性重构。</description></item><item><title>DeepSeek 以视觉原语思考：让多模态大模型学会「用手指着推理」</title><link>https://xuquant.com/posts/deepseek-thinking-with-visual-primitives/</link><pubDate>Thu, 30 Apr 2026 20:00:00 +0800</pubDate><guid>https://xuquant.com/posts/deepseek-thinking-with-visual-primitives/</guid><description>深度解读 DeepSeek 联合北大/清华提出的「以视觉原语思考」新范式：用坐标和边界框替代自然语言描述，解决多模态模型推理过程中的指代歧义问题，迷宫导航任务领先 GPT-5.4 达 17 个百分点。</description></item><item><title>SceneVerse++: Lifting Unlabeled Internet Videos into 3D Scene Understanding Training Data</title><link>https://xuquant.com/posts/sceneverse-plus-data-engine-for-3d-scene-understanding/</link><pubDate>Thu, 30 Apr 2026 18:00:00 +0800</pubDate><guid>https://xuquant.com/posts/sceneverse-plus-data-engine-for-3d-scene-understanding/</guid><description>Deep analysis of CVPR 2026 SceneVerse++: how to build the largest-scale real-world 3D scene dataset from unlabeled internet videos, covering detection, segmentation, spatial VQA, and vision-language navigation.</description></item><item><title>Qwen3.5 vs Qwen3: A Deep Architectural Comparison</title><link>https://xuquant.com/posts/autodrive/qwen3-vs-qwen3-5-architecture/</link><pubDate>Wed, 29 Apr 2026 14:00:00 +0800</pubDate><guid>https://xuquant.com/posts/autodrive/qwen3-vs-qwen3-5-architecture/</guid><description>A deep architectural comparison of Qwen3.5 versus Qwen3, examining hybrid attention, native multimodal fusion, high-sparsity MoE, and partial RoPE across attention, vision, and MoE dimensions</description></item><item><title>CORAL: Autonomous Multi-Agent Evolution for Open-Ended Discovery</title><link>https://xuquant.com/posts/ai/coral-autonomous-multi-agent-evolution/</link><pubDate>Thu, 15 May 2025 10:00:00 +0800</pubDate><guid>https://xuquant.com/posts/ai/coral-autonomous-multi-agent-evolution/</guid><description>How delegating evolutionary search decisions to autonomous agents—rather than relying on fixed heuristics—enables faster convergence and stronger results across mathematical and systems optimization tasks.</description></item><item><title>InSpatio-World: Real-Time 4D World Simulation via Spatiotemporal Autoregressive Modeling</title><link>https://xuquant.com/posts/autodrive/inspatio-world-4d-simulator/</link><pubDate>Sun, 20 Apr 2025 10:00:00 +0800</pubDate><guid>https://xuquant.com/posts/autodrive/inspatio-world-4d-simulator/</guid><description>A deep technical analysis of InSpatio-World: a 1.3B-parameter real-time 4D world simulator that combines implicit spatiotemporal caching with explicit geometric constraints, achieving 24 FPS novel-view synthesis from monocular video.</description></item><item><title>Multi-Head Latent Attention: Efficient KV Cache Compression in DeepSeek-V2</title><link>https://xuquant.com/posts/autodrive/deepseek_series1_mla/</link><pubDate>Sat, 15 Feb 2025 10:00:00 +0800</pubDate><guid>https://xuquant.com/posts/autodrive/deepseek_series1_mla/</guid><description>Deep technical analysis of Multi-Head Latent Attention (MLA) from DeepSeek-V2, covering low-rank KV cache compression, decoupled RoPE design, and computational cost comparison with MHA, MQA, and GQA.</description></item></channel></rss>