JEPA on Xu'Blog

JEPA on Xu'Bloghttps://xuquant.com/tags/jepa/Recent content in JEPA on Xu'BlogXu'Bloghttps://xuquant.com/og-default.pnghttps://xuquant.com/og-default.pngHugo -- 0.152.2zhSun, 24 May 2026 10:00:00 +0800Dense Latent Predictive Supervision in AD VLA：为什么 pixel 不是最优https://xuquant.com/posts/autonomous-driving/dense-latent-predictive-supervision/Sun, 24 May 2026 10:00:00 +0800https://xuquant.com/posts/autonomous-driving/dense-latent-predictive-supervision/AD VLA 用 sparse trajectory loss（12 个 waypoint × 2D = 24 scalars）监督 2B+ 参数 backbone，信息论 ratio ~10⁻¹⁰——supervision deficit 是 NAVSIM 87-93 区间停滞的核心原因。DriveVLA-W0 用 pixel-level future image prediction 补，方向对但路线非最优。V-JEPA 风格 latent predictive supervision 在 capacity / 推理 cost / 评测同构性三条上都更友好。Driving JEPA 综述：V-JEPA 系列方法在自动驾驶场景的应用https://xuquant.com/posts/world-models/driving-jepa/Sat, 21 Feb 2026 10:00:00 +0800https://xuquant.com/posts/world-models/driving-jepa/V-JEPA 系列在自动驾驶 benchmark 上的迁移综述：因果未来掩码、motion-aware mask、temporal-coherent mask 等 driving-specific 变体的 fine-tune 结果对比，以及 driving 与通用视频自监督在 mask 假设上的根本 mismatch。LeJEPA：当 JEPA 不再需要启发式https://xuquant.com/posts/world-models/lejepa/Sat, 07 Feb 2026 10:00:00 +0800https://xuquant.com/posts/world-models/lejepa/LeJEPA 把 JEPA 从依赖 stop-gradient、teacher-student、EMA 等一系列启发式的工程产物，重新拉回到可证明最优的理论框架——SIGReg 通过随机切片把嵌入分布对齐到各向同性高斯，单超参、线性复杂度、约 50 行代码。本文把这件事放回到 JEPA 防 collapse 的方法学谱系里，并解释它为什么是 LeCun 在 2025 年访谈中亲自背书的方向。V-JEPA 2.1: When Self-Supervised Vision Learns to See Every Pixelhttps://xuquant.com/posts/world-models/vjepa-2.1/Sat, 10 Jan 2026 10:00:00 +0800https://xuquant.com/posts/world-models/vjepa-2.1/A deep analysis of V-JEPA 2.1's architectural innovations — dense predictive loss, deep self-supervision, multi-modal tokenizer, and scaling — tracing the path from collapsed context tokens to dense features that encode spatial structure, and the connection to depth estimation as geometric grounding.