Hello! I am an Assistant Professor at the University of Pittsburgh in Industrial Engineering. My primary research interests are in the methodological areas of reinforcement learning, approximate dynamic programming, and sequential decision making. I am also interested in a variety of operations research applications, such as energy markets, ride-sharing, and public health. I received my Ph.D. in May 2016 in Operations Research and Financial Engineering from Princeton University, where I was advised by Warren B. Powell.
Exploration via Sample-Efficient Subgoal Design
Brief Description: We consider problems where an agent faces an unknown task (drawn from a distribution of MDPs) in the future and is given prior opportunities to "practice" on related tasks where the interactions are still expensive. We propose a one-step Bayes-optimal algorithm for selecting subgoal designs, along with the number of episodes and the episode length during training, to efficiently maximize the expected performance of the agent at test time.
Inventory Repositioning in On-Demand Product Rental Networks
Brief Description: We consider a product rental network with a fixed number of rental units distributed across multiple locations. We show convexity of the value function and that the optimal policy can be described in terms of a well-specified region over the state space. We leverage these results in an infinite-horizon, cutting-plane-based ADP algorithm and prove its asymptotic optimality, improving upon previous convergence results in the literature.
Structured Actor-Critic for Managing and Dispensing Public Health Inventory
Brief Description: We consider the setting of public health inventory control/dispensing and propose a new actor-critic algorithm that tracks both policy and value function approximations. The algorithm utilizes structure in both the policy and value to improve the empirical convergence rate. We also provide a case study for the problem of dispensing naloxone (an overdose reversal drug) via mobile needle exchange clinics amidst the ongoing opioid crisis.
Feedback-Based Tree Search for Reinforcement Learning
Brief Description: We describe a technique that iteratively applies MCTS on batches of small, finite-horizon versions of the original infinite-horizon MDP. We show that a deep neural network implementation of the technique can create a competitive AI agent for a popular multi-player online battle arena (MOBA) game.
Practicality of Nested Risk Measures for Dynamic Electric Vehicle Charging
Major revision at Manufacturing & Service Operations Management, 2019.
Brief Description: Risk-averse MDPs formulated with nested (dynamic) risk measures are often used as a tool for solving problems with predefined "practical" risk and reward metrics. In this paper, we study the extent to which the two-sides of this framework are compatible with each other in the setting of dynamic EV charging — roughly speaking, does a "more risk-averse" MDP provide lower risk in the practical sense as well?
Optimistic Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds
Operations Research, accepted for publication, 2019.
Brief Description: MCTS is a well-known strategy for solving sequential decision problems, particularly in the area of game-play AI. We propose a new technique called Primal-Dual MCTS that utilizes sampled information relaxation (Brown et. al., 2010) bounds on potential actions in order to make tree expansion decisions. The approach shows promise when used to optimize the behavior of a driver navigating a graph while operating on a ride-sharing platform.
Shape Constraints in Economics and Operations Research
Brief Description: This paper reviews an illustrative set of research on shape constrained estimation in the economics and operations research literature. We highlight the methodological innovations and applications, with a particular emphasis on utility functions, production economics, and sequential decision making applications.
Risk-Averse Approximate Dynamic Programming with Quantile-Based Risk Measures
Brief Description: We propose a new Q-learning algorithm and a companion sampling procedure to solve risk-averse Markov decision processes under a class of dynamic quantile-based risk measures. Convergence results are proven and an application to energy storage is shown.
An Approximate Dynamic Programming Algorithm for Monotone Value Functions
Brief Description: We describe a provably convergent algorithm to exploit the structural property of monotonicity that arises in many applications in operations research, finance, and economics. We show via simulations that near optimal solutions can be obtained using the proposed method when the exact approach is computationally intractable.
Optimal Hour-Ahead Bidding in the Real-Time Electricity Market with Battery Storage using Approximate Dynamic Programming
Brief Description: We formulate a mathematical model for bidding in the real-time market with the goal of performing energy arbitrage (i.e., exploiting variations in spot prices to profit) in the presence of storage. We train and test an approximate dynamic programming policy on real spot price data from the NYISO and show its value over heuristic policies used in industry.
Decision Models, Undergraduate/Master's Level
Instructor, Fall 2016, Fall 2017, Fall 2018
Reinforcement Learning, Master's Level
Instructor, Summer 2018
Probability and Stochastic Systems, Instructor: Prof. Ramon van Handel
Assistant in Instruction, Fall 2015
Senior Thesis Writing Group, ORFE Department
Group Leader, Acad. Years 2013-2016
Operations and Information Engineering, Instructor: Prof. Warren B. Powell
Assistant in Instruction, Fall 2013, Fall 2014
Optimal Learning, Instructor: Prof. Warren B. Powell
Assistant in Instruction, Spring 2013
Fantasy Football: Use this app to quantify the role of luck in (Yahoo!) Fantasy Football by generating probability distributions of your record over randomized season schedules.