Policy-Gradient on Xu'Blog

Policy-Gradient on Xu'Bloghttps://xuquant.com/tags/policy-gradient/Recent content in Policy-Gradient on Xu'BlogXu'Bloghttps://xuquant.com/images/profile.jpghttps://xuquant.com/images/profile.jpgHugo -- 0.152.2enWed, 30 Apr 2025 10:00:00 +0800Policy Optimization for End-to-End Autonomous Driving: From REINFORCE to GRPOhttps://xuquant.com/posts/autodrive/rl-policy-optimization-e2e-driving/Wed, 30 Apr 2025 10:00:00 +0800https://xuquant.com/posts/autodrive/rl-policy-optimization-e2e-driving/A systematic derivation of policy optimization methods for end-to-end autonomous driving: from REINFORCE through PPO to GRPO, covering advantage estimation, sampling differences between LLM and driving, multi-objective loss design, and the role of noise in diffusion-based exploration.