My research aims to build a unified algorithmic framework where a robot efficiently infers the optimal value function from a bounded set of interactions with both humans and the environment. It ties together insights from motion planning and imitation learning and applies them to robots deployed in the wild.
| 2020 - 2021 |
| Feedback and Moments in Imitation Learning |
The central quest in imitation learning is to understand under what situations do imitation errors feedback and compound? We answer this question for a large family of previous imitation learning algorithms via a unified framework of moment matching. We argue that moments directly relate the performance difference between a learner and the expert demonstrator and provide upper and lower bounds for various class of such moments.
| 2018 - 2019 |
| f-Divergence Minimization |
We view imitation learning as minimizing divergence between the learner's and the expert's state-action distributions. We propose a general framework for estimating and minimizing any f-Divergence. By plugging in different divergences, we recover existing algorithms such as Behavior Cloning (Forward KL), GAIL (Jensen Shannon) and DAGGER (Total Variation). Moreover, we motivate cases where Reverse KL matters and derive new algorithms for minimizing it.
| 2019 - 2020 |
| Human-Centric Imitation Learning |
Typical imitation learning algorithms rely on either interactive feedback or kinesthetic demonstrations, both of which are expensive, repetitive, and often unnatural for an expert to provide. Can we learn from less burdensome expert inputs such as interventions, corrections or hints? We formalize these problems and provide algorithms that learn the correct behavior even with such minimal interaction.
| 2016 - 2017 |
| Imitation of Clairvoyant Oracles |
We look at POMDP problems where the latent space is large, e.g. the space of all possible maps. Directly computing optimal policies for all possible beliefs is not tractable. However, during training, we can be clairvoyant, i.e., we know the ground truth MDP and can compute optimal plans. We show how to properly imitate such clairvoyant oracles to get good, and sometimes near-optimal, POMDP policies.