Paper-Reading

CORAL: Autonomous Multi-Agent Evolution for Open-Ended Discovery

Introduction Figure from CORAL: Autonomous Multi-Agent Evolution for Open-Ended Discovery Open-ended discovery—the search for novel, high-quality solutions in domains where the solution space lacks clear structure and evaluation may be expensive or sparse—remains one of the hardest challenges in automated scientific reasoning. Unlike constrained optimization, where gradients or convexity guide the search, open-ended problems demand sustained exploration, accumulation of partial insights, and the ability to redirect effort when progress stalls. Mathematical conjecture proving, systems-level code optimization, and combinatorial design all fall squarely in this category. ...

InSpatio-World: Real-Time 4D World Simulation via Spatiotemporal Autoregressive Modeling

Figure from InSpatio-World: Real-Time 4D World Simulation via Spatiotemporal Autoregressive Modeling The ability to simulate a 4D world — one that evolves in time and can be viewed from arbitrary perspectives — is a foundational capability for autonomous driving, robotics, and embodied AI. Existing video generation models produce visually compelling sequences but lack spatial consistency when the camera moves. 3D reconstruction methods achieve geometric fidelity but struggle with dynamic scenes and real-time performance. InSpatio-World bridges this gap through a spatiotemporal autoregressive (STAR) architecture that combines the strengths of both paradigms. ...

Alpamayo: Reasoning-Action Aligned VLA for Autonomous Driving

中文版本：阅读中文版 Introduction Figure from Alpamayo-R1: Bridging Reasoning and Action Prediction for Autonomous Driving End-to-end autonomous driving has made significant progress in recent years, yet deploying Vision-Language-Action (VLA) models in real-world driving scenarios remains challenging. The basic difficulties are fourfold. First, multi-frame temporal understanding requires the model to extract decision-relevant changes from highly redundant consecutive observations, rather than merely processing static snapshots. Second, driving decisions must be causal: the model must model why a particular action is taken, not just learn statistical correlations between situations and actions. Third, predicted trajectories must satisfy kinematic and dynamic constraints while remaining multi-modal and efficient enough for real-time inference. Fourth, the reasoning process must be tightly aligned with action output—reasoning should not be a post-hoc rationalization but must be verifiable by and constrained by the actual actions taken. ...