My research aims to build a unified algorithmic framework where a robot efficiently infers the optimal value function from a bounded set of interactions with both humans and the environment. It ties together insights from motion planning and imitation learning and applies them to robots deployed in the wild.
| 2017 - 2019 |
| Blending MPC & Value Function Approximation |
We present a framework for improving on model predictive control (MPC) with model-free reinforcement learning (RL). The key insight is to view MPC as constructing a series of local Q-function approximations. By appropriately blending these various Q approximations over time, we can systematically trade-off model errors with learned value errors.
| 2017 - 2019 |
| Bayesian Reinforcement Learning |
Addressing uncertainty is critical for autonomous systems to robustly adapt to the real world. We formulate the problem of model uncertainty as a Bayes-Adaptive Markov Decision Process (BAMDP), where an agent maintains a posterior distribution over latent model parameters given a history of observations and maximizes its expected long-term reward with respect to this belief distribution. We propose algorithms to solve continuous BAMDPs efficiently.
| 2016 - 2017 |
| Bayesian Traveler's Problem |
Consider a traveler on a graph who must reach a goal (or cover a set of goals) but does not know which edges are traversable. The traversability is only revealed when the traveler attempts the edge (or visits an adjacent vertex). Given a prior on edges, how should the traveler move to minimize expected travel time? Many real robotics applications are instances of this problem, e.g. manipulation in occlusion.