Abstract
Reinforcement Learning (RL) becomes increasingly important in recent years as the huge success of AlphaGo and AlphaZero. However, this technique is a not a newly born research topic, which originates from the well-developed dynamic programming method. In this paper, we explore the history of RL from the bibliometric perspective for the last 30 years, to capture its landscapes and emerging trends. We conduct comprehensive assessments of the RL technology according to articles related to RL in SCI database from 1990 to 2020, and extensive results indicate that reinforcement learning research goes up significantly in the past three decades, including a total of 9344 articles covering 96 countries/territories. Top five most productive countries are USA, China, England, Japan, Germany and Canada. There are 4507 research institutes involved in the field of RL and among them the top five productive ones are Chinese Academy of Sciences, University College London, Beijing University of Posts and Telecommunications, Tsinghua University and Northeastern University and Princeton University. Besides, top frequently adopted keywords with strongest citation burst are Genetic Algorithm, Dynamic Programming, Q-Learning, Mobile Robot, Wireless Sensor Network, Smart Grid, Big Data, Inverse Reinforcement Learning and Cognitive Radio, which demonstrate the emerging trends of this field. We claim the results shown in this paper provide a dynamic view of the evolution of “Reinforcement Learning” research landscapes and trends from various perspectives that is able to serve as a potential future research guide, and the way we demonstrate could also be adopted to analyze other research topics.
L. Zeng and X. Yin—Contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction (1988)
Selvaraju, R.R., Das, A., Vedantam, R., Cogswell, M., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 128(2), 336–359 (2020)
Dai, H., Khalil, E.B., Zhang, Y., Dilkina, B., Song, L.: Learning combinatorial optimization algorithms over graphs. In: NIPS 2017 Proceedings of the 31st International Conference on Neural Information Processing Systems, vol. 30, pp. 6351–6361 (2017)
Luong, N.C., et al.: Applications of deep reinforcement learning in communications and networking: a survey. IEEE Commun. Surv. Tutor. 21(4), 3133–3174 (2019)
Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(1), 1334–1373 (2016)
Kober, J., Andrew Bagnell, J., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32(11), 1238–1274 (2013)
Abdulhai, B., Pringle, R., Karakoulas, G.J.: Reinforcement learning for true adaptive traffic signal control. J. Transp. Eng.-ASCE 129(3), 278–285 (2003)
Xiong, R., Cao, J., Yu, Q.: Reinforcement learning-based real-time power management for hybrid energy storage system in the plug-in hybrid electric vehicle. Appl. Energy 211, 538–548 (2018)
Olivecrona, M., Blaschke, T., Engkvist, O., Chen, H.: Molecular de-Novo design through deep reinforcement learning. J. Cheminformatics 9(1), 48 (2017)
Zhou, Z., Li, X., Zare, R.N.: Optimizing chemical reactions with deep reinforcement learning. ACS Cent. Sci. 3(12), 1337–1344 (2017)
Fan, C., Zeng, L., Sun, Y., Liu, Y.-Y.: Finding key players in complex networks through deep reinforcement learning. Nat. Mach. Intell. 2, 317–324 (2020)
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4(1), 237–285 (1996)
Busoniu, L., Babuska, R., De. Schutter, B.: A comprehensive survey of multiagent reinforcement learning. Syst. Man Cybern. 38(2), 156–172 (2008)
Pritchard, A.: Statistical bibliography or bibliometrics. J. Documentation 25, 348 (1969)
Fan, C., Zeng, L., Ding, Y., Chen, M., Sun, Y., Liu, Z.: Learning to identify high betweenness centrality nodes from scratch: a novel graph neural network approach. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 559–568 (2019)
Garfield, E.: Citation indexes for science: a new dimension in documentation through association of ideas. Science 122(3159), 108–111 (1955)
Chen, C.: CiteSpace II: detecting and visualizing emerging trends and transient patterns in scientific literature. J. Am. Soc. Inf. Sci. 57(3), 359–377 (2006)
van Eck, N.J., Waltman, L.: Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics 84(2), 523–538 (2010)
Kleinberg, J.: Bursty and hierarchical structure in streams. Data Min. Knowl. Disc. 7(4), 373–397 (2003)
Egghe, L.: Theory and practise of the g-index. Scientometrics 69(1), 131–152 (2006)
Schvaneveldt, R.W.: Pathfinder Associative Networks: Studies in Knowledge Organization (1990)
Brandes, U.: A faster algorithm for betweenness centrality. J. Math. Sociol. 25(2), 163–177 (2001)
Aryadoust, S.V., Tan, H.A.H., Ng, L.Y.: A scientometric review of Rasch measurement: the rise and progress of a specialty. Front. Psychol. 10, 2197 (2019)
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. Adv. Neural. Inf. Process. Syst. 14, 849–856 (2001)
Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20(1), 53–65 (1987)
Lewis, F.L., Vrabie, D., Vamvoudakis, K.G.: Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers. IEEE Control Syst. Mag. 32(6), 76–105 (2012)
Liu, D., Wei, Q.: Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Trans. Neural Netw. 25(3), 621–634 (2014)
Bhasin, S., Kamalapurkar, R., Johnson, M., Vamvoudakis, K.G., Lewis, F.L., Dixon, W.E.: A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 49(1), 82–92 (2013)
Karimpour, A., Naghibi-Sistani, M.B.: Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50(4), 1167–1175 (2014)
Vamvoudakis, K.G., Lewis, F.L., Hudas, G.R.: Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality. Automatica 48(8), 1598–1611 (2012)
Mahadevan, S., Connell, J.: Automatic programming of behavior-based robots using reinforcement learning. Artif. Intell. 55(2), 311–365 (1992)
Tsitsiklis, J.N., Van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Trans. Autom. Control 42(5), 674–690 (1997)
Jaakkola, T., Jordan, M.I., Singh, S.P.: On the convergence of stochastic iterative dynamic programming algorithms. Neural Comput. 6, 1185–1201 (1994)
Asada, M., Noda, S., Tawaratsumida, S., Hosoda, K.: Purposive behavior acquisition for a real robot by vision-based reinforcement learning. Mach. Learn. 23(2), 279–303 (1996)
Tsitsiklis, J.N., van Roy, B.: Feature-based methods for large scale dynamic programming. Mach. Learn. 22(1), 59–94 (1996)
Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 3389–3396 (2017)
Todorov, E., Erez, T., Tassa, Y.: MuJoCo: a physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012)
Zhu, Y., Mak, B.: Speeding up softmax computations in DNN-based large vocabulary speech recognition by senone weight vector selection. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5335–5339 (2017)
Levine, S., Pastor, P., Krizhevsky, A., Ibarz, J., Quillen, D.: Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. Int. J. Robot. Res. 37(4–5), 421–436 (2017)
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 23–30 (2017)
Wu, J., He, H., Peng, J., Li, Y., Li, Z.: Continuous reinforcement learning of energy management with deep Q network for a power split hybrid electric bus. Appl. Energy 222, 799–811 (2018)
Li, L., Lv, Y., Wang, F.-Y.: Traffic signal timing via deep reinforcement learning. IEEE/CAA J. Automatica Sinica 3(3), 247–254 (2016)
Mocanu, E., et al.: On-line building energy optimization using deep reinforcement learning. IEEE Trans. Smart Grid 10(4), 3698–3708 (2019)
El-Tantawy, S., Abdulhai, B., Abdelgawad, H.: Multiagent reinforcement learning for integrated network of adaptive traffic signal controllers (MARLIN-ATSC): methodology and large-scale application on downtown Toronto. IEEE Trans. Intell. Transp. Syst. 14(3), 1140–1150 (2013)
Ruelens, F., Claessens, B.J., Vandael, S., De Schutter, B., Babuska, R., Belmans, R.: Residential demand response of thermostatically controlled loads using batch reinforcement learning. IEEE Trans. Smart Grid 8(5), 2149–2159 (2017)
Lewis, F.L., Vrabie, D.: Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 9(3), 32–50 (2009)
Al-Tamimi, A., Lewis, F.L., Abu-Khalaf, M.: Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. Syst. Man Cybern. 38(4), 943–949 (2008)
Vamvoudakis, K.G., Lewis, F.L.: Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5), 878–888 (2010)
Zhang, H., Cui, L., Zhang, X., Luo, Y.: Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Trans. Neural Netw. 22(12), 2226–2236 (2011)
Modares, H., Lewis, F.L.: Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning. IEEE Trans. Autom. Control 59(11), 3051–3056 (2014)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Zeng, L., Yin, X., Li, Y., Li, Z. (2021). Exploring the Landscapes and Emerging Trends of Reinforcement Learning from 1990 to 2020: A Bibliometric Analysis. In: Tan, Y., Shi, Y. (eds) Advances in Swarm Intelligence. ICSI 2021. Lecture Notes in Computer Science(), vol 12690. Springer, Cham. https://doi.org/10.1007/978-3-030-78811-7_35
Download citation
DOI: https://doi.org/10.1007/978-3-030-78811-7_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-78810-0
Online ISBN: 978-3-030-78811-7
eBook Packages: Computer ScienceComputer Science (R0)