One of the key challenges that investors face is determining financial strategies that are optimal for multi-period investment objectives. Such objectives arise naturally, and can be encountered in different investment areas, such as life cycle investing, structured fund solutions, and multi-period strategic asset allocation, among others.

In general, these multi-period optimization problems can be very difficult to solve. In trying to do so, practitioners typically apply heuristics, such as aged-based investing for life cycle investing, or Monte-Carlo simulations with different predefined financial strategies. These strategies are mostly derived from single period optimizations or historic best practices. The one with the best outcome in terms of the investor’s multi-period objectives is then selected. Another - more academic - approach to solving such problems is to use “numerical dynamic programming” (see Sutton & Barto, 2018 for more on this topic). If applicable, it can identify the optimal financial strategy, but, in many cases, it cannot be applied due to the “curse of dimensionality”, effectively the inescapable fact that a model’s complexity grows faster than its inputs (see Taylor, 2019).

The above challenges have led us today to Deep Reinforcement Learning (DRL), which is a very promising AI-driven method for gaining key additional insights into financial strategy optimization and can supplement those already mentioned. Over the last few years, DRL has been successfully applied to multi-period financial strategy optimization, as explained in the academic literature. Two seminal papers are Buehler, Gonon, Teichman, & Wood, 2018, in which the authors presented a framework for hedging a portfolio of derivatives, and Duarte, Fonseca, Goodman, & Parker, 2021 in which optimal portfolio choices were derived in a complex lifecycle model. Both studies successfully used DRL.

The purpose of this short, introductory, paper is to demonstrate how DRL can be applied in practice. We explain the topic conceptually and discuss its advantages - mainly flexibility and generality - and also its disadvantages - explainability and local optimal solutions - and how to address them. We show readers how to implement DRL, and then apply it to a well-known, and already solved, life cycle consumption problem as a proof of concept.

This paper is organized as follows: in section one we describe what DRL is, and how it can be applied to solve multi-period financial optimization tasks, while in section two we give a short overview of different areas in asset management where multi-period financial strategy optimization is relevant, and we present an application to life cycle investing as a proof concept. Finally, section three concludes and gives an outlook.


Click here to download the full article.



CIO View