Reinforcement Learning for End-to-End Autonomous Driving: From Offline DPO to Iterative Self-Improvement
中文版本:阅读中文版 Introduction The integration of reinforcement learning into end-to-end autonomous driving systems has emerged as a promising direction for improving trajectory planning beyond what supervised learning alone can achieve. However, the direct application of standard RL algorithms to driving tasks faces core challenges: the sim-to-real gap in log-replay environments, the computational bottleneck of online simulation, and the difficulty of defining dense reward signals for continuous trajectory generation. ...