LexXu

Algorithm Engineer · Autonomous Driving · VLA / World Models

About Me

I’m LexXu — an algorithm engineer, six years deep into autonomous driving. I’ve walked the full arc from rule-based PNC to NN-Planner to VLA foundation models, and now lead the VLA team at QianLi Tech (formerly Megvii Research), shipping models on the Thor platform at 10 Hz.

Most of what’s on this site, though, is written for myself first. Papers don’t stick unless I retype them; equations don’t internalize unless I derive them by hand. Everything you find here is really just the residue of me wrestling with some idea long enough that it left a trace.

The math series is the part I care about most. There’s a kind of math underneath modern ML — the bedrock kind — that I’ll admit defeats me on first read, even when good writing has already laid it out. So I’ve been re-translating it: in language I can follow, with visualizations I can picture. For myself, mostly, and for engineer-readers who feel the same way.

The deeper pull is that math is a way of seeing. Every once in a while one piece clicks and the world quietly rearranges itself, and that small “oh, that’s how it runs” feeling is most of why I keep doing this. The knowledge graph here is my attempt to make it systematic — woven, instead of scattered.

Off the keyboard: films, music, meditation and yoga, travel — I’d like to see most of this world before I’m done. Life, as far as I can tell, is mostly an experience worth being awake for.

Experience

QianLi Tech (formerly Megvii Research) — VLA Team Lead

Algorithm Lead

Feb 2025 – Present

Lead the VLA group: end-to-end driving with NN Planner upgrades and a native multimodal VLA, both deployed on-vehicle.

NN Planner Module (Feb 2025 – Nov 2025)

  • Owned the NN-Planner module for one-stage end-to-end driving, consuming perception features directly. Reproduced and stress-tested multiple decoder designs on the production line.
  • Upgraded the legacy ARDecoder V1 to V2, fixing three structural issues (hard-to-supervise high-order kinematic targets, poor VRU modeling, motion-information loss under frame downsampling): cluster trajectory state via k-means into a finite Trajectory Vocabulary, removing the inverse-kinematics error and eliminating the need for a hand-coded motion model. Kept joint prediction/planning, scene simulation, and interactivity intact, and shipped on-vehicle.
  • Iterated a Flow Matching variant in parallel — strong frame-to-frame stability and comfort, but ODE sampling traded off trajectory diversity and slowed RL convergence.
  • Open-set AR + Diffusion fusion: AR proposes anchors (cold-start fix), Diffusion refines them (discretization-drift fix). NavSim 94.85, above human-driver GT. Paper submitted to NeurIPS.
  • Built the evaluation module from scratch — six axes (safety / comfort / efficiency / regulation / trajectory / navigation) — doubling as a rule-based reward model for the RL stack.

VLA Project (Dec 2025 – Present)

  • Project lead. Owned the full technology roadmap, model design, and on-vehicle deployment for a VLA foundation model targeting unstructured / long-tail / generalization scenarios.
  • Three architectural iterations: VLM + discrete action tokenscontinuous Flow-Matching plannernative multimodal VLA. Co-tuned prompts and base models; Thor-platform 10 Hz deployment.
  • Compact navigation encoding (turn-icon / distance / waypoints) with layered injection: navigation efficiency +70 %.
  • Multi-view spatiotemporal vision encoder with same token budget and inference cost: temporal bad-case fix rate 60 %, frame-to-frame consistency +30 %.
  • Frame-level atomic information pipeline — navigation, road topology, and LLM-arbitrated key-object selection / meta-action labeling: label accuracy 60 % → 90 %+.
  • For dense supervision at scale, adopted a V-JEPA pretraining regime with causal future masking and implicit latent prediction: NavSim SOTA without RL (NeurIPS submitted). Private-set metrics: total +5 %, collision +10 %, bad-case pass-rate +20 %.

NIO — Autonomous Driving R&D, Foundation Models Division

Senior Algorithm Engineer

Aug 2023 – Feb 2025

OD dataset curation, automated QC, multimodal models, and diffusion-based planning.

OD Dataset Curation (Aug 2023 – Mar 2024)

  • Active learning by gradient-similarity to extract the reasoning-process diversity from a 100M-scale legacy + incremental OD dataset.
  • 20 % curated data matches full-set accuracy, 5× training speedup; −90 % annotation cost on incremental data (10× efficiency).

Annotation Pipeline QC (Apr 2024 – Jul 2024)

  • Specialized QC model targeting the two most common auto-labeling defects (bbox-mismatch, heading error), operating on cropped patches.
  • Precision 92 %, Recall 89 % on the gold eval set. Downstream OD model gained +1 pp overall and +70 % recall on near-range mis-detections. Two patents.

VLM for Trajectory Tasks (Jun 2024 – Nov 2024)

  • Adapted LLaVA-Next for driving: replaced backbone with DINOv2-base, designed a 3D Q-Former for vision-text alignment, built a custom VQA dataset for 3D reasoning.
  • VLM-augmented planner handles complex / fallback / long-tail scenarios stably. One patent.

Diffusion-Based Planner (Oct 2024 – Feb 2025)

  • Inspired by Diffusion Policy in robotics. Transformer noise model on a DDPM schedule, predicting 7 s of ego trajectory end-to-end.
  • Smooth and controllable long-horizon prediction; naturally fits multi-modal distributions.

ECARX Tech — Senior Algorithm Engineer

Mar 2022 – Aug 2023

Parking Perception (mass-production)

  • 4-way fisheye object detection on a transformer backbone with BEV transform, for the Lynk & Co automatic-parking program. Diverse augmentations for lighting / scale: mAP 66.5 %.
  • Deployed on Black Sesame A1000, accuracy-aligned with PC. Hardware acceleration + multithreaded async pipeline kept end-to-end perception at 15 Hz.

Institute of Software, CAS — State Key Lab

Algorithm Engineer

Jan 2021 – Feb 2022

  • RL-based driving decisions over diverse complex scenarios via a self-built Gym sim platform. >98 % decision accuracy; safe lane-change / accel-decel behavior.
  • Teaching assistant for the Reinforcement Learning course at UCAS Hangzhou Institute — courseware and lab platform.

Waytous Intelligent — Algorithm Intern

Jan 2020 – Aug 2020

  • Trajectory-prediction dataset construction over millions of trajectories from open-source / vehicle / cloud sources: end-to-end automation from ingestion → filtering → format alignment → storage → distribution analysis → visualization.

Education

Sun Yat-sen University

PhD, Computer Science and Technology

Aug 2020 – Jan 2021

Beihang University

MSc, Control Science and Engineering

Sep 2017 – Jan 2020

Northeastern University

BSc, Automation

Sep 2011 – Jul 2015

Skills

  • Languages & tools. Python, C++, PyTorch, Git, Docker, Shell. Productive with modern AI tooling.
  • Autonomous driving. Full pipeline from rule-based PNC → NN-Planner → VLA foundation models. Vision perception, temporal understanding, VLM pretraining, small-sample fine-tuning, on-vehicle deployment.
  • Management. ~1 year leading the VLA team — team building, organizational growth, project execution.

Awards & Activities

  • Northeastern University: Self-reliance Scholarship ×1; Academic Scholarship ×3.
  • Beihang University: Academic Scholarship ×2.
  • 8 publications (2 as first author); 1 Outstanding Paper award.
  • China FSAE Driverless — BUAA AERO Team, Perception Lead.
访客 2766 人次 · 访问 3605 次