Ryan Julian (University of Southern California)*; Benjamin Swanson (Google); Gaurav Sukhatme (University of Southern California); Sergey Levine (Google); Chelsea Finn (Google Brain); Karol Hausman (Google Brain)
2020-11-18, 11:50 - 12:20 PST | PheedLoop Session
One of the great promises of robot learning systems is that they will be able to learn from their mistakes and continuously adapt to ever-changing environments. Despite this potential, most of the robot learning systems today produce static policies that are not further adapted during deployment, because the algorithms which produce those policies are not designed for continual adaptation. We present an adaptation method, and empirical evidence that it supports a robot learning framework for continual adaption. We show that this very simple method–fine-tuning off-policy reinforcement learning using offline datasets–is robust to changes in background, object shape and appearance, lighting conditions, and robot morphology. We demonstrate how to adapt vision-based robotic manipulation policies to new variations using less than 0.2% of the data necessary to learn the task from scratch. Furthermore, we demonstrate that this robustness holds in an episodic continual learning setting. We also show that pre-training via RL is essential: training from scratch or adapting from super vised ImageNet features are both unsuccessful with such small amounts of data. Our empirical conclusions are consistently supported by experiments on simulated manipulation tasks, and by 60 unique fine-tuning experiments on a real robotic grasping system pre-trained on 580,000 grasps. For video results and an overview of the methods and experiments in this study, see the project website at https://ryanjulian.me/continual-fine-tuning.