Hello! I am a Research Scientist at Facebook and on leave as Assistant Professor at the University of Pittsburgh since 2019. My research interests are in the methodological areas of reinforcement learning, approximate DP, and sequential decision making. I am also interested in a variety of operations research applications, such as energy markets, ride-sharing, and public health. I received my Ph.D. in May 2016 in Operations Research and Financial Engineering from Princeton University, where I was advised by Warren B. Powell.
Working paper (draft available), 2020.
Brief Description: We introduce the lookahead-bounded Q-learning (LBQL) algorithm, a new, provably convergent variant of Q-learning that seeks to make better use of collected experience through the use of noisy "lookahead" upper and lower bounds that constrain the Q-iterates. The algorithm operates via a "feedback loop" by using approximate Q-values to estimate bounds and subsquently using those bounds to improve the Q-values (and repeat).
BoTorch: Programmable Bayesian Optimization in PyTorch
Brief Description: Bayesian optimization provides sample-efficient global optimization for a broad range of applications, including automatic machine learning, molecular chemistry, and experimental design. We introduce BoTorch, a modern programming framework for Bayesian optimization, along with a new "one-shot" approach to optimizing the Knowledge Gradient acquisition function.
Exploration via Sample-Efficient Subgoal Design
Brief Description: We consider problems where an agent faces an unknown task (drawn from a distribution of MDPs) in the future and is given prior opportunities to "practice" on related tasks where the interactions are still expensive. We propose a one-step Bayes-optimal algorithm for selecting subgoal designs, along with the number of episodes and the episode length during training, to efficiently maximize the expected performance of the agent at test time.
Structured Actor-Critic for Managing Public Health Points-of-Dispensing
Brief Description: We consider the setting of public health medical inventory control/dispensing and propose a new actor-critic algorithm that tracks both policy and value function approximations. The algorithm utilizes structure in both the policy and value to improve the empirical convergence rate. We also provide a case study for the problem of dispensing naloxone (an overdose reversal drug) amidst the ongoing opioid crisis.
Dynamic Inventory Repositioning in On-Demand Rental Networks
Major revision at Management Science, 2020.
Brief Description: We consider a product rental network with a fixed number of rental units distributed across multiple locations. We show convexity of the value function and that the optimal policy can be described in terms of a well-specified region over the state space. We leverage these results in an infinite-horizon, cutting-plane-based ADP algorithm and prove its asymptotic optimality, improving upon previous convergence results in the literature.
Feedback-Based Tree Search for Reinforcement Learning
Brief Description: We describe a technique that iteratively applies MCTS on batches of small, finite-horizon versions of the original infinite-horizon MDP. We show that a deep neural network implementation of the technique can create a competitive AI agent for a popular multi-player online battle arena (MOBA) game.
Practicality of Nested Risk Measures for Dynamic Electric Vehicle Charging
Major revision at Manufacturing & Service Operations Management, 2019.
Brief Description: Risk-averse MDPs formulated with nested (dynamic) risk measures are often used as a tool for solving problems with predefined "practical" risk and reward metrics. In this paper, we study the extent to which the two-sides of this framework are compatible with each other in the setting of dynamic EV charging — roughly speaking, does a "more risk-averse" MDP provide lower risk in the practical sense as well?
Optimistic Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds
Operations Research, accepted for publication, 2019.
Brief Description: MCTS is a well-known strategy for solving sequential decision problems, particularly in the area of game-play AI. We propose a new technique called Primal-Dual MCTS that utilizes sampled information relaxation (Brown et. al., 2010) bounds on potential actions in order to make tree expansion decisions. The approach shows promise when used to optimize the behavior of a driver navigating a graph while operating on a ride-sharing platform.
Shape Constraints in Economics and Operations Research
Brief Description: This paper reviews an illustrative set of research on shape constrained estimation in the economics and operations research literature. We highlight the methodological innovations and applications, with a particular emphasis on utility functions, production economics, and sequential decision making applications.
Risk-Averse Approximate Dynamic Programming with Quantile-Based Risk Measures
Brief Description: We propose a new Q-learning algorithm and a companion sampling procedure to solve risk-averse Markov decision processes under a class of dynamic quantile-based risk measures. Convergence results are proven and an application to energy storage is shown.
An Approximate Dynamic Programming Algorithm for Monotone Value Functions
Brief Description: We describe a provably convergent algorithm to exploit the structural property of monotonicity that arises in many applications in operations research, finance, and economics. We show via simulations that near optimal solutions can be obtained using the proposed method when the exact approach is computationally intractable.
Optimal Hour-Ahead Bidding in the Real-Time Electricity Market with Battery Storage using Approximate Dynamic Programming
Brief Description: We formulate a mathematical model for bidding in the real-time market with the goal of performing energy arbitrage (i.e., exploiting variations in spot prices to profit) in the presence of storage. We train and test an approximate dynamic programming policy on real spot price data from the NYISO and show its value over heuristic policies used in industry.
Approximate Dynamic Programming, Ph.D. Level
Instructor, Spring 2017, Fall 2018
Course Description: ADP refers to a broad set of computational methods used for finding approximately optimal policies of intractable sequential decision problems (MDPs). We'll begin with an overview of classical methods and transition to a survey of state-of-the-art developments. The lectures will focus on mathematical proofs and underlying theory while the course project will allow students practice with numerical implementations. All lecture notes, based on a number of papers and texts (primarily Bertsekas and Tsitsiklis), are available online.
Decision Models, Undergraduate/Master's Level
Instructor, Fall 2016, Fall 2017, Fall 2018
Course Description: Decision making is the key in understanding a variety of problems in industry, including inventory control, revenue management, pricing, energy, healthcare, logistics, and finance. In this course, we focus on stochastic decision models (i.e., "decision making under uncertainty") and discuss the fundamental methodology/models in conjunction with applications to real world problems. Students should have a basic understanding of probability and optimization (linear programming).
Ride-sharing Analytics Game, Undergraduate/Master's Level
Instructor, Fall 2018
Description: This was an event designed for the Decision Models course, where teams of students (1) analyze a data set, (2) design a pricing, manufacturing, advertising, and repositioning strategy to operate a ride-sharing company, and (3) compete in a live competition where decisions are submitted periodically to a simulator designed specifically for the course. If you are an instructor and interested in running this event in your course, please let me know. See the description and data for more information.
Reinforcement Learning, Master's Level
Instructor, Summer 2018
Course Description: This is an introductory course on reinforcement learning (RL). The basics of MDPs necessary for RL will be covered, along with a wide range of methods (e.g., TD learning, Q-learning, policy gradients) that perform evaluation and control will be covered. The focus in this course will be on applications, implementation, intuition, and some theory. All lecture notes, based on the Sutton & Barto RL textbook, are available online.
Probability and Stochastic Systems, Instructor: Prof. Ramon van Handel
Assistant in Instruction, Fall 2015
Senior Thesis Writing Group, ORFE Department
Group Leader, Acad. Years 2013-2016
Operations and Information Engineering, Instructor: Prof. Warren B. Powell
Assistant in Instruction, Fall 2013, Fall 2014
Optimal Learning, Instructor: Prof. Warren B. Powell
Assistant in Instruction, Spring 2013
Fantasy Football: Use this app to quantify the role of luck in (Yahoo!) Fantasy Football by generating probability distributions of your record over randomized season schedules.