Continuous-Action Reinforcement Learning for Portfolio Allocation of a Life Insurance Company

Abrate, Carlo; Angius, Alessio; De Francisci Morales, Gianmarco; Cozzini, Stefano; Iadanza, Francesca; Puma, Laura Li; Pavanelli, Simone; Perotti, Alan; Pignataro, Stefano; Ronchiadin, Silvia

doi:10.1007/978-3-030-86514-6_15

Continuous-Action Reinforcement Learning for Portfolio Allocation of a Life Insurance Company

Carlo Abrate¹²,
Alessio Angius¹²,
Gianmarco De Francisci Morales¹²,
Stefano Cozzini¹³,
Francesca Iadanza¹³,
Laura Li Puma¹⁴,
Simone Pavanelli¹³,
Alan Perotti¹²,
Stefano Pignataro¹³ &
…
Silvia Ronchiadin¹⁴

Conference paper
First Online: 10 September 2021

1787 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12978))

Abstract

The asset management of an insurance company is more complex than traditional portfolio management due to the presence of obligations that the insurance company must fulfill toward the clients. These obligations, commonly referred to as liabilities, are payments whose magnitude and occurrence are a byproduct of insurance contracts with the clients, and of portfolio performances.

In particular, while clients must be refunded in case of adverse events, such as car accidents or death, they also contribute to a common financial portfolio to earn annual returns. Customer withdrawals might increase whenever these returns are too low or, in the presence of an annual minimum guaranteed, the company might have to integrate the difference. Hence, in this context, any investment strategy cannot omit the inter-dependency between financial assets and liabilities.

To deal with this problem, we present a stochastic model that combines portfolio returns with the liabilities generated by the insurance products offered by the company. Furthermore, we propose a risk-adjusted optimization problem to maximize the capital of the company over a pre-determined time horizon.

Since traditional financial tools are inadequate for such a setting, we develop the model as a Markov Decision Process. In this way, we can use Reinforcement Learning algorithms to solve the underlying optimization problem. Finally, we provide experiments that show how the optimal asset allocation can be found by training an agent with the algorithm Deep Deterministic Policy Gradient.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The other three assets are omitted as they go to zero very quickly.

References

Black, F., Litterman, R.: Global portfolio optimization. Financ. Anal. J. 48(5), 28–43 (1992)
Article Google Scholar
Buhler, H., Gonon, L., Teichmann, J., Wood, B.: Deep hedging (2018)
Google Scholar
Cox, J.C., Ingersoll, J.E., Ross, S.A.: A theory of the term structure of interest rates. Econometrica 53(2), 385–407 (1985). ISSN 00129682, 14680262
Google Scholar
De Asis, K., Chan, A., Pitis, S., Sutton, R.S., Graves, D.: Fixed-horizon temporal difference methods for stable reinforcement learning. arXiv preprint arXiv:1909.03906 (2019)
Denardo, E.V.: On linear programming in a Markov decision problem. Manag. Sci. 16(5), 281–288 (1970)
Article MathSciNet Google Scholar
Doob, J.L.: The Brownian movement and stochastic equations. Ann. Math. 351–369 (1942)
Google Scholar
Fontoura, A., Haddad, D., Bezerra, E.: A deep reinforcement learning approach to asset-liability management. In: 2019 8th Brazilian Conference on Intelligent Systems (BRACIS), pp. 216–221. IEEE (2019)
Google Scholar
Halperin, I.: QLBS: Q-learner in the Black-Scholes(-Merton) worlds. arXiv preprint arXiv:1712.04609 (2017)
Jangmin, O., Lee, J., Lee, J.W., Zhang, B.T.: Adaptive stock trading with dynamic asset allocation using reinforcement learning. Inf. Sci. 176(15), 2121–2147 (2006)
Article Google Scholar
Jiang, N., Kulesza, A., Singh, S., Lewis, R.: The dependence of effective planning horizon on model accuracy. In: Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2015, Richland, SC, pp. 1181–1189. International Foundation for Autonomous Agents and Multiagent Systems (2015). ISBN 9781450334136
Google Scholar
Krabichler, T., Teichmann, J.: Deep replication of a runoff portfolio (2020)
Google Scholar
Leibowitz, M., Fabozzi, F.J., Sharpe, W.: Investing: The Collected Works of Martin L. Leibowitz. Probus Professional Pub (1992)
Google Scholar
Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(39), 1–40 (2016)
MathSciNet MATH Google Scholar
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Mansley, C., Weinstein, A., Littman, M.: Sample-based planning for continuous action Markov decision processes. In: Twenty-First International Conference on Automated Planning and Scheduling (2011)
Google Scholar
Marcus, S.I., Fernández-Gaucherand, E., Hernández-Hernandez, D., Coraluppi, S., Fard, P.: Risk sensitive Markov decision processes. In: Byrnes, C.I., Datta, B.N., Martin, C.F., Gilliam, D.S. (eds.) Systems and Control in the Twenty-First Century. PSCT, vol. 22, pp. 263–279. Springer, Heidelberg (1997). https://doi.org/10.1007/978-1-4612-4120-1_14
Chapter Google Scholar
Markowitz, H.: Portfolio selection. J. Financ. 7(1), 77–91 (1952)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015). ISSN 00280836
Google Scholar
Moody, J., Saffell, M.: Learning to trade via direct reinforcement. IEEE Trans. Neural Netw. 12(4), 875–89 (2001)
Article Google Scholar
Moody, J., Wu, L., Liao, Y., Saffell, M.: Performance functions and reinforcement learning for trading systems and portfolios. J. Forecast. 17(5–6), 441–470 (1998)
Article Google Scholar
Nevmyvaka, Y., Feng, Y., Kearns, M.: Reinforcement learning for optimized trade execution. In: Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, pp. 673–680. Association for Computing Machinery, New York (2006)
Google Scholar
Peters, J., Vijayakumar, S., Schaal, S.: Reinforcement learning for humanoid robotics. In: IEEE-RAS International Conference on Humanoid Robots (Humanoids2003), Karlsruhe, Germany, 29–30 September (2003). CLMC
Google Scholar
Plappert, M., et al.: Parameter space noise for exploration. arXiv preprint arXiv:1706.01905 (2017)
de Prado, M.L.: Advances in Financial Machine Learning, 1st edn. Wiley, Hoboken (2018)
Google Scholar
Silver, D., Hassabis, D.: Mastering the game of go with deep neural networks and tree search. Nature 529, 484–503 (2016)
Article Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
MATH Google Scholar
Wang, H., Zhou, X.Y.: Continuous-time mean-variance portfolio selection: a reinforcement learning framework. Math. Financ. 30(4), 1273–1308 (2020)
Article MathSciNet Google Scholar

Download references

Acknowledgements

The research was conducted under a cooperative agreement between ISI Foundation, Intesa Sanpaolo Innovation Center, and Intesa Sanpaolo Vita. The authors would like to thank Lauretta Filangieri, Antonino Galatà, Giuseppe Loforese, Pietro Materozzi and Luigi Ruggerone for their useful comments.

Author information

Authors and Affiliations

ISI Foundation, Turin, Italy
Carlo Abrate, Alessio Angius, Gianmarco De Francisci Morales & Alan Perotti
Intesa Sanpaolo Vita, Turin, Italy
Stefano Cozzini, Francesca Iadanza, Simone Pavanelli & Stefano Pignataro
Intesa Sanpaolo Innovation Center, Turin, Italy
Laura Li Puma & Silvia Ronchiadin

Authors

Carlo Abrate
View author publications
You can also search for this author in PubMed Google Scholar
Alessio Angius
View author publications
You can also search for this author in PubMed Google Scholar
Gianmarco De Francisci Morales
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Cozzini
View author publications
You can also search for this author in PubMed Google Scholar
Francesca Iadanza
View author publications
You can also search for this author in PubMed Google Scholar
Laura Li Puma
View author publications
You can also search for this author in PubMed Google Scholar
Simone Pavanelli
View author publications
You can also search for this author in PubMed Google Scholar
Alan Perotti
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Pignataro
View author publications
You can also search for this author in PubMed Google Scholar
Silvia Ronchiadin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alan Perotti .

Editor information

Editors and Affiliations

Facebook AI, Seattle, WA, USA
Yuxiao Dong
Torre Telefonica, Barcelona, Spain
Nicolas Kourtellis
Bielefeld University, CITEC, Bielefeld, Germany
Barbara Hammer
Basque Center for Applied Mathematics, Bilbao, Spain
Jose A. Lozano

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Abrate, C. et al. (2021). Continuous-Action Reinforcement Learning for Portfolio Allocation of a Life Insurance Company. In: Dong, Y., Kourtellis, N., Hammer, B., Lozano, J.A. (eds) Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. ECML PKDD 2021. Lecture Notes in Computer Science(), vol 12978. Springer, Cham. https://doi.org/10.1007/978-3-030-86514-6_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-86514-6_15
Published: 10 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86513-9
Online ISBN: 978-3-030-86514-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)