Offline Reinforcement Learning Strategy for Floor Cleaning Robots Based on Conservative Q-Learning Algorithm
Feb 1, 2023
·
1 min read

Project Duration: Sep. 2022 to Feb. 2023
Project Details
- Within the ROS environment based on the Noetic version, an offline reinforcement learning algorithm CQL (Conservative Q-Learning) is utilized, introducing conservative constraints into the Q-value updates
- Various domestic simulation environments are constructed in Gazebo, and path trajectories are generated using the A* algorithm to collect trajectory data
- The Rviz tool is employed to view and analyze the robot’s trajectory, with manual annotation of the optimal path
- The annotated data serves as a supervisory signal to train the model, with the parameters being saved
Achievements
A simulation path training set for the vacuum cleaner in Gazebo has been obtained. The conservative coefficient α has been adjusted to optimize the CQL (Conservative Q-Learning) model.