NTR Webinar: Offline Imitation Learning from Demonstrations Plus Unlabeled Data

NTR organizes and hosts scientific webinars on neural networks and invites speakers from all over the world to present their recent work. 

On November 9 Alexander Novikov, Deepmind, London, UK, presented a technical Zoom webinar on Offline Imitation Learning from Demonstrations Plus Unlabeled Data. 

About the webinar: 

Behavior cloning (BC) works well for imitation learning, because it allows a policy to be trained offline without rewards by supervised learning on expert demonstrations. But often in practice we only have a few high quality demonstrations (not nearly enough to train a good BC agent) and a large corpus of unlabeled trajectories of unknown quality without reward annotations, which are not directly useful for a BC agent. This unlabeled data can come from human teleoperation, scripted policies and other agents running on the same robot.

In this talk I covered a few techniques for training offline RL agents from such data by first learning a reward function by contrasting observations from demonstrator and unlabeled trajectories, then annotating all data with the learned reward, and finally training an agent via offline reinforcement learning.

Webinar presentation.

Moderator and contact: NTR CEO Nick Mikhailovsky: nickm@ntrlab.com.

Leave a Reply

Your email address will not be published. Required fields are marked *