My research aims to build a unified algorithmic framework where a robot efficiently infers the optimal value function from a bounded set of interactions with both humans and the environment. It ties together insights from motion planning and imitation learning and applies them to robots deployed in the wild.
2017 - 2019 | Blending MPC & Value Function Approximation We present a framework for improving on model predictive control (MPC) with model-free reinforcement learning (RL). The key insight is to view MPC as constructing a series of local Q-function approximations. By appropriately blending these various Q approximations over time, we can systematically trade-off model errors with learned value errors. |
2017 - 2019 | Bayesian Reinforcement Learning Addressing uncertainty is critical for autonomous systems to robustly adapt to the real world. We formulate the problem of model uncertainty as a Bayes-Adaptive Markov Decision Process (BAMDP), where an agent maintains a posterior distribution over latent model parameters given a history of observations and maximizes its expected long-term reward with respect to this belief distribution. We propose algorithms to solve continuous BAMDPs efficiently. |
2016 - 2017 | Bayesian Traveler's Problem Consider a traveler on a graph who must reach a goal (or cover a set of goals) but does not know which edges are traversable. The traversability is only revealed when the traveler attempts the edge (or visits an adjacent vertex). Given a prior on edges, how should the traveler move to minimize expected travel time? Many real robotics applications are instances of this problem, e.g. manipulation in occlusion. |