Daniel R. Jiang

Hello! I am an Assistant Professor at the University of Pittsburgh in Industrial Engineering. My primary research interests are in the methodological areas of reinforcement learning, approximate dynamic programming, and sequential decision making. I am also interested in a variety of operations research applications, such as energy markets, ride-sharing, and public health. I received my Ph.D. in May 2016 in Operations Research and Financial Engineering from Princeton University, where I was advised by Warren B. Powell.

drjiang@pitt.edu

CV (PDF)

Publications and Papers Under Review

Exploration via Sample-Efficient Subgoal Design

Yijia Wang, Matthias Poloczek, Daniel R. Jiang

Submitted, 2019.

Brief Description: We consider problems where an agent faces an unknown task (drawn from a distribution of MDPs) in the future and is given prior opportunities to "practice" on related tasks where the interactions are still expensive. We propose a one-step Bayes-optimal algorithm for selecting subgoal designs, along with the number of episodes and the episode length during training, to efficiently maximize the expected performance of the agent at test time.

Inventory Repositioning in On-Demand Product Rental Networks

Saif Benjafaar, Daniel R. Jiang, Xiang Li, and Xiaobo Li

Submitted, 2018.

Brief Description: We consider a product rental network with a fixed number of rental units distributed across multiple locations. We show convexity of the value function and that the optimal policy can be described in terms of a well-specified region over the state space. We leverage these results in an infinite-horizon, cutting-plane-based ADP algorithm and prove its asymptotic optimality, improving upon previous convergence results in the literature.

Structured Actor-Critic for Managing and Dispensing Public Health Inventory

Yijia Wang and Daniel R. Jiang

Submitted, 2018.

Brief Description: We consider the setting of public health inventory control/dispensing and propose a new actor-critic algorithm that tracks both policy and value function approximations. The algorithm utilizes structure in both the policy and value to improve the empirical convergence rate. We also provide a case study for the problem of dispensing naloxone (an overdose reversal drug) via mobile needle exchange clinics amidst the ongoing opioid crisis.

Feedback-Based Tree Search for Reinforcement Learning

Daniel R. Jiang, Emmanuel Ekwedike, and Han Liu

International Conference on Machine Learning, ICML 2018.

Brief Description: We describe a technique that iteratively applies MCTS on batches of small, finite-horizon versions of the original infinite-horizon MDP. We show that a deep neural network implementation of the technique can create a competitive AI agent for a popular multi-player online battle arena (MOBA) game.

Practicality of Nested Risk Measures for Dynamic Electric Vehicle Charging

Daniel R. Jiang and Warren B. Powell

Major revision at Manufacturing & Service Operations Management, 2019.

Brief Description: Risk-averse MDPs formulated with nested (dynamic) risk measures are often used as a tool for solving problems with predefined "practical" risk and reward metrics. In this paper, we study the extent to which the two-sides of this framework are compatible with each other in the setting of dynamic EV charging — roughly speaking, does a "more risk-averse" MDP provide lower risk in the practical sense as well?

Optimistic Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds

Daniel R. Jiang, Lina Al-Kanj, and Warren B. Powell

Operations Research, accepted for publication, 2019.

Brief Description: MCTS is a well-known strategy for solving sequential decision problems, particularly in the area of game-play AI. We propose a new technique called Primal-Dual MCTS that utilizes sampled information relaxation (Brown et. al., 2010) bounds on potential actions in order to make tree expansion decisions. The approach shows promise when used to optimize the behavior of a driver navigating a graph while operating on a ride-sharing platform.

Shape Constraints in Economics and Operations Research

Andrew L. Johnson and Daniel R. Jiang

Statistical Science, 33(4), pp. 527-546, 2018.

Brief Description: This paper reviews an illustrative set of research on shape constrained estimation in the economics and operations research literature. We highlight the methodological innovations and applications, with a particular emphasis on utility functions, production economics, and sequential decision making applications.

Risk-Averse Approximate Dynamic Programming with Quantile-Based Risk Measures

Daniel R. Jiang and Warren B. Powell

Mathematics of Operations Research, 43(2), pp. 554-579, 2018.

Brief Description: We propose a new Q-learning algorithm and a companion sampling procedure to solve risk-averse Markov decision processes under a class of dynamic quantile-based risk measures. Convergence results are proven and an application to energy storage is shown.

An Approximate Dynamic Programming Algorithm for Monotone Value Functions

Daniel R. Jiang and Warren B. Powell

Operations Research, 63(6), pp. 1489-1511, 2015.

Brief Description: We describe a provably convergent algorithm to exploit the structural property of monotonicity that arises in many applications in operations research, finance, and economics. We show via simulations that near optimal solutions can be obtained using the proposed method when the exact approach is computationally intractable.

Optimal Hour-Ahead Bidding in the Real-Time Electricity Market with Battery Storage using Approximate Dynamic Programming

Daniel R. Jiang and Warren B. Powell

INFORMS Journal on Computing, 27(3), pp. 525-543, 2015.

Brief Description: We formulate a mathematical model for bidding in the real-time market with the goal of performing energy arbitrage (i.e., exploiting variations in spot prices to profit) in the presence of storage. We train and test an approximate dynamic programming policy on real spot price data from the NYISO and show its value over heuristic policies used in industry.

Teaching

@ University of Pittsburgh

Approximate Dynamic Programming, Ph.D. Level

Instructor, Spring 2017, Fall 2018

Decision Models, Undergraduate/Master's Level

Instructor, Fall 2016, Fall 2017, Fall 2018

IE 2186

Reinforcement Learning, Master's Level

Instructor, Summer 2018

@ Princeton University
ORF 309

Probability and Stochastic Systems, Instructor: Prof. Ramon van Handel

Assistant in Instruction, Fall 2015

STWG

Senior Thesis Writing Group, ORFE Department

Group Leader, Acad. Years 2013-2016

ORF 411

Operations and Information Engineering, Instructor: Prof. Warren B. Powell

Assistant in Instruction, Fall 2013, Fall 2014

ORF 418

Optimal Learning, Instructor: Prof. Warren B. Powell

Assistant in Instruction, Spring 2013

Other Projects

What Would I Say: Read the New Yorker, CNN, and Telegraph articles about the project we created at Hack Princeton 2013, which uses Markov chains to simulate a user's social media posts. The site has generated over 17 million page views and 9 million unique users.

Fantasy Football: Use this app to quantify the role of luck in (Yahoo!) Fantasy Football by generating probability distributions of your record over randomized season schedules.