ReconDreamer-RL: Enhancing Reinforcement Learning via Diffusion-based Scene Reconstruction

  • Chaojun Ni*1,2
  • Guosheng Zhao*1,3
  • Xiaofeng Wang*1
  • Zheng Zhu*1
  • Wenkang Qin1
  • Xinze Chen1
  • Guanghong Jia4
  • Guan Huang1
  • Wenjun Mei2
  • 1 GigaAI
  • 2 Peking University
  • 3 CASIA
  • 4 Tsinghua University

Abstract

Reinforcement learning for training end-to-end autonomous driving models in closed-loop simulations is gaining growing attention. However, most simulation environments differ significantly from real-world conditions, creating a substantial simulation-to-reality (sim2real) gap. To bridge this gap, some approaches utilize scene reconstruction techniques to create photorealistic environments as a simulator. While this improves realistic sensor simulation, these methods are inherently constrained by the distribution of the training data, making it difficult to render high-quality sensor data for novel trajectories or corner case scenarios. Therefore, we propose ReconDreamer-RL, a framework designed to integrate video diffusion priors into scene reconstruction to aid reinforcement learning, thereby enhancing end-to-end autonomous driving training. Specifically, in ReconDreamer-RL, we introduce ReconSimulator, which combines the video diffusion prior for appearance modeling and incorporates a kinematic model for physical modeling, thereby reconstructing driving scenarios from real-world data. This narrows the sim2real gap for closed-loop evaluation and reinforcement learning. To cover more corner-case scenarios, we introduce the Dynamic Adversary Agent (DAA), which adjusts the trajectories of surrounding vehicles relative to the ego vehicle, autonomously generating corner-case traffic scenarios (e.g., cut-in). Finally, the Cousin Trajectory Generator (CTG) is proposed to address the issue of training data distribution, which is often biased toward simple straight-line movements. Experiments show that ReconDreamer-RL improves end-to-end autonomous driving training, outperforming imitation learning methods with a 5× reduction in the Collision Ratio.

Pipeline

In ReconDreamer-RL, ReconSimulator improves appearance modeling by ReconDreamer and incorporates physical modeling to reconstruct driving scenes. In the imitation learning stage, DAA generates corner-case scenario trajectories, while CTG diversifies the ego vehicle's actions and uses ReconSimulator to render sensor data for training the policy. In the reinforcement learning stage, the policy is trained in a closed-loop environment, interacting with DAA-controlled surrounding vehicles.

ReconSimulator

The process of integrating the diffusion prior for appearance modeling. During the reconstruction of driving scenes, we first render novel trajectory view videos. These rendered videos are then processed by the DriveRestorer to enhance their visual quality, and the restored results are used to further optimize the reconstruction model. This iterative process continues until the reconstruction model converges.

Dynamic Adversary Agent

The pipeline of the DAA. DAA identifies the target vehicles based on their distances to the ego car from the BEV view, where the blue line represents the ego car’s trajectory and the red line represents the target vehicle. Then, DAA generates novel trajectories based on the specified interactive behavior. The generated trajectories are checked, and feasible ones are rendered using ReconSimulator.

Examples of Dynamic Adversary Agent (DAA) controlling surrounding vehicles to simulate cut-in scenarios.

Cousin Trajectory Generator

Cousin Trajectory Generator (CTG) generates cousin trajectories and performs trajectory checks to eliminate unreasonable trajectories (e.g., the pink cross marks), and finally renders the corresponding sensor data in the ReconSimulator.

Qualitative Examples

Comparison of different methods in challenging corner cases, with collisions highlighted by orange boxes.