<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Self-Supervised on Xu'Blog</title><link>https://xuquant.com/tags/self-supervised/</link><description>Recent content in Self-Supervised on Xu'Blog</description><image><title>Xu'Blog</title><url>https://xuquant.com/og-default.png</url><link>https://xuquant.com/og-default.png</link></image><generator>Hugo -- 0.152.2</generator><language>zh</language><lastBuildDate>Sun, 24 May 2026 10:00:00 +0800</lastBuildDate><atom:link href="https://xuquant.com/tags/self-supervised/index.xml" rel="self" type="application/rss+xml"/><item><title>Dense Latent Predictive Supervision in AD VLA：为什么 pixel 不是最优</title><link>https://xuquant.com/posts/autonomous-driving/dense-latent-predictive-supervision/</link><pubDate>Sun, 24 May 2026 10:00:00 +0800</pubDate><guid>https://xuquant.com/posts/autonomous-driving/dense-latent-predictive-supervision/</guid><description>AD VLA 用 sparse trajectory loss（12 个 waypoint × 2D = 24 scalars）监督 2B+ 参数 backbone，信息论 ratio ~10⁻¹⁰——supervision deficit 是 NAVSIM 87-93 区间停滞的核心原因。DriveVLA-W0 用 pixel-level future image prediction 补，方向对但路线非最优。V-JEPA 风格 latent predictive supervision 在 capacity / 推理 cost / 评测同构性三条上都更友好。</description></item><item><title>凯明的方法论：从 ResNet 到 iMF —— 一个本质追问者的研究路径</title><link>https://xuquant.com/posts/foundation-models/kaiming-he-cvpr2026-five-papers-flow-matching-breakthrough/</link><pubDate>Sat, 18 Apr 2026 18:00:00 +0800</pubDate><guid>https://xuquant.com/posts/foundation-models/kaiming-he-cvpr2026-five-papers-flow-matching-breakthrough/</guid><description>以 iMF（Improved Mean Flow，arXiv:2512.02012）为主线深读何恺明 2026 CVPR 工作，并把它放回 ResNet / MoCo / MAE / SiT 十年脉络中，抓四条贯穿性的方法论 DNA：朴素到极致、改变问题假设、强先验少假设、方法与任务解耦。强链 mathematics/diffusion 系列。</description></item><item><title>LeJEPA：当 JEPA 不再需要启发式</title><link>https://xuquant.com/posts/world-models/lejepa/</link><pubDate>Sat, 07 Feb 2026 10:00:00 +0800</pubDate><guid>https://xuquant.com/posts/world-models/lejepa/</guid><description>LeJEPA 把 JEPA 从依赖 stop-gradient、teacher-student、EMA 等一系列启发式的工程产物，重新拉回到可证明最优的理论框架——SIGReg 通过随机切片把嵌入分布对齐到各向同性高斯，单超参、线性复杂度、约 50 行代码。本文把这件事放回到 JEPA 防 collapse 的方法学谱系里，并解释它为什么是 LeCun 在 2025 年访谈中亲自背书的方向。</description></item><item><title>DINOv3：自监督视觉基模的规模化困局与 Gram Anchoring 破局</title><link>https://xuquant.com/posts/world-models/dinov3/</link><pubDate>Sat, 24 Jan 2026 10:00:00 +0800</pubDate><guid>https://xuquant.com/posts/world-models/dinov3/</guid><description>DINOv3 核心贡献剖析：Gram anchoring 如何解决大规模自监督训练中 dense feature 退化的根本问题，7B 参数 SSL 模型的训练工程，以及它在深度估计和 3D 匹配上的突破意味着什么。</description></item><item><title>V-JEPA 2.1: When Self-Supervised Vision Learns to See Every Pixel</title><link>https://xuquant.com/posts/world-models/vjepa-2.1/</link><pubDate>Sat, 10 Jan 2026 10:00:00 +0800</pubDate><guid>https://xuquant.com/posts/world-models/vjepa-2.1/</guid><description>A deep analysis of V-JEPA 2.1&amp;#39;s architectural innovations — dense predictive loss, deep self-supervision, multi-modal tokenizer, and scaling — tracing the path from collapsed context tokens to dense features that encode spatial structure, and the connection to depth estimation as geometric grounding.</description></item><item><title>ReconVLA：用 gaze-crop 重建给 VLA 视觉接地</title><link>https://xuquant.com/posts/foundation-models/reconvla-gaze-crop-implicit-grounding/</link><pubDate>Mon, 27 Oct 2025 22:00:00 +0800</pubDate><guid>https://xuquant.com/posts/foundation-models/reconvla-gaze-crop-implicit-grounding/</guid><description>OpenHelix 的 ReconVLA (arXiv:2508.10333) 在 OpenVLA 风格的 backbone 后挂一个 3 层 DiT，用 gaze-crop 的 VAE-latent 重建当辅助监督，把 VLA 的注意力锚到目标物体上。本文对照 paper 与开源 code 读一遍，包含 paper 没强调的工程细节，以及几个 paper 没回答的问题——recon-on/off ablation 缺位，&amp;#39;隐式接地&amp;#39; 在训练 supervision 上其实依赖 offline YOLO bbox。</description></item></channel></rss>