My research aims to build a unified algorithmic framework where a robot efficiently infers the optimal value function from a bounded set of interactions with both humans and the environment. It ties together insights from motion planning and imitation learning and applies them to robots deployed in the wild.


2020 - 2021

2019_fdiv_il

Feedback and Moments in Imitation Learning

The central quest in imitation learning is to understand under what situations do imitation errors feedback and compound? We answer this question for a large family of previous imitation learning algorithms via a unified framework of moment matching. We argue that moments directly relate the performance difference between a learner and the expert demonstrator and provide upper and lower bounds for various class of such moments.
Papers:  arXiv'21arXiv'21

2018 - 2019

2019_fdiv_il

f-Divergence Minimization

We view imitation learning as minimizing divergence between the learner's and the expert's state-action distributions. We propose a general framework for estimating and minimizing any f-Divergence. By plugging in different divergences, we recover existing algorithms such as Behavior Cloning (Forward KL), GAIL (Jensen Shannon) and DAGGER (Total Variation). Moreover, we motivate cases where Reverse KL matters and derive new algorithms for minimizing it.
Papers:  WAFR'20

2019 - 2020

2019_user_il

Human-Centric Imitation Learning

Typical imitation learning algorithms rely on either interactive feedback or kinesthetic demonstrations, both of which are expensive, repetitive, and often unnatural for an expert to provide. Can we learn from less burdensome expert inputs such as interventions, corrections or hints? We formalize these problems and provide algorithms that learn the correct behavior even with such minimal interaction.
Papers:  RSS'20 Videos: Spotlight Talk

2016 - 2017

2016_pomdp_il

Imitation of Clairvoyant Oracles

We look at POMDP problems where the latent space is large, e.g. the space of all possible maps. Directly computing optimal policies for all possible beliefs is not tractable. However, during training, we can be clairvoyant, i.e., we know the ground truth MDP and can compute optimal plans. We show how to properly imitate such clairvoyant oracles to get good, and sometimes near-optimal, POMDP policies.
Papers:  IJRR'18(finalist)RSS'17ICRA'17   /   Videos: Long1Long2Short1