Structured Learning for Stochastic Optimization: Algorithm Design and Convergence


The PhD project deals with decision making under uncertainty, with a particular focus on stochastic OR problems. Such problems, in a stylised known modelling setting (e.g., Markov chains), are solved by first formulating the so-called Bellman equations and then by implementing a numerical algorithm (e.g., value iteration). In a more general setting (e.g., when the model or its underlying dynamics are unknown), reinforcement learning can be used to estimate the solution of the Bellman equations. This project aims to overcome the challenges that commonly occur during the solving process.

The first major challenge when computing the aforementioned solutions is the so called curse of dimensionality, as the solving algorithms are computationally expensive. We propose to reduce the computational complexity by focusing on the structure of the SOR problem at hand. For example, for regenerative models or for rewards/costs only collected in the final state, a first step argument can be used to formulate a computationally efficient system of equations. These equations have the same solution as the Bellman equations, but require significantly less  iterations. 

The idea of utilising the underlying structure can also be extended to overcome the second challenge: most reinforcement learning algorithms are not guaranteed to converge. This is either due to the structure of the objective function or due to an approximation step implemented within the algorithm. In this project emphasis will be placed on addressing both causes and on improving the algorithms’ convergence. 

For the former cause, note that typically in reinforcement learning one uses a generic objective function, for instance the cross entropy, which might not possess any structural properties (such as convexity in a minimisation problem). We intend to overcome this challenge by appropriately selecting the underlying function so as to ensure that the minimisation (maximisation) problem is ideally convex (concave). 

For the latter cause, in most reinforcement learning algorithms a neural network is used to approximate the solution of the Bellman equations. This approximation can lead to the algorithms not converging to the theoretical solution. We intend to reduce the inaccuracy that results from the approximation, by equipping the neural network with the necessary structure coming from the problem at hand.

Supervisors Stella Kapodistria (TU/e) and Sem Borst (TU/e)
PhD student Wessel Blomerus
Location Eindhoven University of Technology


This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement Grant Agreement No 945045.