Exploring the Landscapes and Emerging Trends of Reinforcement Learning from 1990 to 2020: A Bibliometric Analysis

Zeng, Li; Yin, Xiaoqing; Li, Yang; Li, Zili

doi:10.1007/978-3-030-78811-7_35

Li Zeng ORCID: orcid.org/0000-0002-4219-788X¹⁰,
Xiaoqing Yin¹⁰,
Yang Li¹¹ &
…
Zili Li¹²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12690))

Included in the following conference series:

International Conference on Swarm Intelligence

1202 Accesses

Abstract

Reinforcement Learning (RL) becomes increasingly important in recent years as the huge success of AlphaGo and AlphaZero. However, this technique is a not a newly born research topic, which originates from the well-developed dynamic programming method. In this paper, we explore the history of RL from the bibliometric perspective for the last 30 years, to capture its landscapes and emerging trends. We conduct comprehensive assessments of the RL technology according to articles related to RL in SCI database from 1990 to 2020, and extensive results indicate that reinforcement learning research goes up significantly in the past three decades, including a total of 9344 articles covering 96 countries/territories. Top five most productive countries are USA, China, England, Japan, Germany and Canada. There are 4507 research institutes involved in the field of RL and among them the top five productive ones are Chinese Academy of Sciences, University College London, Beijing University of Posts and Telecommunications, Tsinghua University and Northeastern University and Princeton University. Besides, top frequently adopted keywords with strongest citation burst are Genetic Algorithm, Dynamic Programming, Q-Learning, Mobile Robot, Wireless Sensor Network, Smart Grid, Big Data, Inverse Reinforcement Learning and Cognitive Radio, which demonstrate the emerging trends of this field. We claim the results shown in this paper provide a dynamic view of the evolution of “Reinforcement Learning” research landscapes and trends from various perspectives that is able to serve as a potential future research guide, and the way we demonstrate could also be adopted to analyze other research topics.

L. Zeng and X. Yin—Contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction (1988)
Google Scholar
Selvaraju, R.R., Das, A., Vedantam, R., Cogswell, M., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 128(2), 336–359 (2020)
Article Google Scholar
Dai, H., Khalil, E.B., Zhang, Y., Dilkina, B., Song, L.: Learning combinatorial optimization algorithms over graphs. In: NIPS 2017 Proceedings of the 31st International Conference on Neural Information Processing Systems, vol. 30, pp. 6351–6361 (2017)
Google Scholar
Luong, N.C., et al.: Applications of deep reinforcement learning in communications and networking: a survey. IEEE Commun. Surv. Tutor. 21(4), 3133–3174 (2019)
Article Google Scholar
Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(1), 1334–1373 (2016)
MathSciNet MATH Google Scholar
Kober, J., Andrew Bagnell, J., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32(11), 1238–1274 (2013)
Article Google Scholar
Abdulhai, B., Pringle, R., Karakoulas, G.J.: Reinforcement learning for true adaptive traffic signal control. J. Transp. Eng.-ASCE 129(3), 278–285 (2003)
Article Google Scholar
Xiong, R., Cao, J., Yu, Q.: Reinforcement learning-based real-time power management for hybrid energy storage system in the plug-in hybrid electric vehicle. Appl. Energy 211, 538–548 (2018)
Article Google Scholar
Olivecrona, M., Blaschke, T., Engkvist, O., Chen, H.: Molecular de-Novo design through deep reinforcement learning. J. Cheminformatics 9(1), 48 (2017)
Article Google Scholar
Zhou, Z., Li, X., Zare, R.N.: Optimizing chemical reactions with deep reinforcement learning. ACS Cent. Sci. 3(12), 1337–1344 (2017)
Article Google Scholar
Fan, C., Zeng, L., Sun, Y., Liu, Y.-Y.: Finding key players in complex networks through deep reinforcement learning. Nat. Mach. Intell. 2, 317–324 (2020)
Article Google Scholar
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4(1), 237–285 (1996)
Article Google Scholar
Busoniu, L., Babuska, R., De. Schutter, B.: A comprehensive survey of multiagent reinforcement learning. Syst. Man Cybern. 38(2), 156–172 (2008)
Article Google Scholar
Pritchard, A.: Statistical bibliography or bibliometrics. J. Documentation 25, 348 (1969)
Google Scholar
Fan, C., Zeng, L., Ding, Y., Chen, M., Sun, Y., Liu, Z.: Learning to identify high betweenness centrality nodes from scratch: a novel graph neural network approach. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 559–568 (2019)
Google Scholar
Garfield, E.: Citation indexes for science: a new dimension in documentation through association of ideas. Science 122(3159), 108–111 (1955)
Article Google Scholar
Chen, C.: CiteSpace II: detecting and visualizing emerging trends and transient patterns in scientific literature. J. Am. Soc. Inf. Sci. 57(3), 359–377 (2006)
Article Google Scholar
van Eck, N.J., Waltman, L.: Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics 84(2), 523–538 (2010)
Article Google Scholar
Kleinberg, J.: Bursty and hierarchical structure in streams. Data Min. Knowl. Disc. 7(4), 373–397 (2003)
Article MathSciNet Google Scholar
Egghe, L.: Theory and practise of the g-index. Scientometrics 69(1), 131–152 (2006)
Article MathSciNet Google Scholar
Schvaneveldt, R.W.: Pathfinder Associative Networks: Studies in Knowledge Organization (1990)
Google Scholar
Brandes, U.: A faster algorithm for betweenness centrality. J. Math. Sociol. 25(2), 163–177 (2001)
Article MATH Google Scholar
Aryadoust, S.V., Tan, H.A.H., Ng, L.Y.: A scientometric review of Rasch measurement: the rise and progress of a specialty. Front. Psychol. 10, 2197 (2019)
Article Google Scholar
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. Adv. Neural. Inf. Process. Syst. 14, 849–856 (2001)
Google Scholar
Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20(1), 53–65 (1987)
Article MATH Google Scholar
Lewis, F.L., Vrabie, D., Vamvoudakis, K.G.: Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers. IEEE Control Syst. Mag. 32(6), 76–105 (2012)
Article MathSciNet MATH Google Scholar
Liu, D., Wei, Q.: Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Trans. Neural Netw. 25(3), 621–634 (2014)
Article Google Scholar
Bhasin, S., Kamalapurkar, R., Johnson, M., Vamvoudakis, K.G., Lewis, F.L., Dixon, W.E.: A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 49(1), 82–92 (2013)
Article MathSciNet MATH Google Scholar
Karimpour, A., Naghibi-Sistani, M.B.: Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50(4), 1167–1175 (2014)
Article MathSciNet MATH Google Scholar
Vamvoudakis, K.G., Lewis, F.L., Hudas, G.R.: Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality. Automatica 48(8), 1598–1611 (2012)
Article MathSciNet MATH Google Scholar
Mahadevan, S., Connell, J.: Automatic programming of behavior-based robots using reinforcement learning. Artif. Intell. 55(2), 311–365 (1992)
Article Google Scholar
Tsitsiklis, J.N., Van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Trans. Autom. Control 42(5), 674–690 (1997)
Article MathSciNet MATH Google Scholar
Jaakkola, T., Jordan, M.I., Singh, S.P.: On the convergence of stochastic iterative dynamic programming algorithms. Neural Comput. 6, 1185–1201 (1994)
Article MATH Google Scholar
Asada, M., Noda, S., Tawaratsumida, S., Hosoda, K.: Purposive behavior acquisition for a real robot by vision-based reinforcement learning. Mach. Learn. 23(2), 279–303 (1996)
Google Scholar
Tsitsiklis, J.N., van Roy, B.: Feature-based methods for large scale dynamic programming. Mach. Learn. 22(1), 59–94 (1996)
MATH Google Scholar
Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 3389–3396 (2017)
Google Scholar
Todorov, E., Erez, T., Tassa, Y.: MuJoCo: a physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012)
Google Scholar
Zhu, Y., Mak, B.: Speeding up softmax computations in DNN-based large vocabulary speech recognition by senone weight vector selection. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5335–5339 (2017)
Google Scholar
Levine, S., Pastor, P., Krizhevsky, A., Ibarz, J., Quillen, D.: Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. Int. J. Robot. Res. 37(4–5), 421–436 (2017)
Google Scholar
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 23–30 (2017)
Google Scholar
Wu, J., He, H., Peng, J., Li, Y., Li, Z.: Continuous reinforcement learning of energy management with deep Q network for a power split hybrid electric bus. Appl. Energy 222, 799–811 (2018)
Article Google Scholar
Li, L., Lv, Y., Wang, F.-Y.: Traffic signal timing via deep reinforcement learning. IEEE/CAA J. Automatica Sinica 3(3), 247–254 (2016)
Article MathSciNet Google Scholar
Mocanu, E., et al.: On-line building energy optimization using deep reinforcement learning. IEEE Trans. Smart Grid 10(4), 3698–3708 (2019)
Article Google Scholar
El-Tantawy, S., Abdulhai, B., Abdelgawad, H.: Multiagent reinforcement learning for integrated network of adaptive traffic signal controllers (MARLIN-ATSC): methodology and large-scale application on downtown Toronto. IEEE Trans. Intell. Transp. Syst. 14(3), 1140–1150 (2013)
Article Google Scholar
Ruelens, F., Claessens, B.J., Vandael, S., De Schutter, B., Babuska, R., Belmans, R.: Residential demand response of thermostatically controlled loads using batch reinforcement learning. IEEE Trans. Smart Grid 8(5), 2149–2159 (2017)
Article Google Scholar
Lewis, F.L., Vrabie, D.: Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 9(3), 32–50 (2009)
Article Google Scholar
Al-Tamimi, A., Lewis, F.L., Abu-Khalaf, M.: Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. Syst. Man Cybern. 38(4), 943–949 (2008)
Article Google Scholar
Vamvoudakis, K.G., Lewis, F.L.: Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5), 878–888 (2010)
Article MathSciNet MATH Google Scholar
Zhang, H., Cui, L., Zhang, X., Luo, Y.: Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Trans. Neural Netw. 22(12), 2226–2236 (2011)
Article Google Scholar
Modares, H., Lewis, F.L.: Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning. IEEE Trans. Autom. Control 59(11), 3051–3056 (2014)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

College of Advanced Interdisciplinary Studies, National University of Defense Technology, Changsha, 410073, China
Li Zeng & Xiaoqing Yin
Hunan Stringle Technology Co., Ltd., Changsha, 410073, China
Yang Li
High-Tech Research Institute, Hunan Institute of Traffic Engineering, Changsha, 410073, China
Zili Li

Authors

Li Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoqing Yin
View author publications
You can also search for this author in PubMed Google Scholar
Yang Li
View author publications
You can also search for this author in PubMed Google Scholar
Zili Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Peking University, Beijing, China
Ying Tan
Southern University of Science and Technology, Shenzhen, China
Yuhui Shi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zeng, L., Yin, X., Li, Y., Li, Z. (2021). Exploring the Landscapes and Emerging Trends of Reinforcement Learning from 1990 to 2020: A Bibliometric Analysis. In: Tan, Y., Shi, Y. (eds) Advances in Swarm Intelligence. ICSI 2021. Lecture Notes in Computer Science(), vol 12690. Springer, Cham. https://doi.org/10.1007/978-3-030-78811-7_35

Download citation

DOI: https://doi.org/10.1007/978-3-030-78811-7_35
Published: 07 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-78810-0
Online ISBN: 978-3-030-78811-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics