Skip to main content

Exploring the Landscapes and Emerging Trends of Reinforcement Learning from 1990 to 2020: A Bibliometric Analysis

  • Conference paper
  • First Online:
Advances in Swarm Intelligence (ICSI 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12690))

Included in the following conference series:

  • 1202 Accesses

Abstract

Reinforcement Learning (RL) becomes increasingly important in recent years as the huge success of AlphaGo and AlphaZero. However, this technique is a not a newly born research topic, which originates from the well-developed dynamic programming method. In this paper, we explore the history of RL from the bibliometric perspective for the last 30 years, to capture its landscapes and emerging trends. We conduct comprehensive assessments of the RL technology according to articles related to RL in SCI database from 1990 to 2020, and extensive results indicate that reinforcement learning research goes up significantly in the past three decades, including a total of 9344 articles covering 96 countries/territories. Top five most productive countries are USA, China, England, Japan, Germany and Canada. There are 4507 research institutes involved in the field of RL and among them the top five productive ones are Chinese Academy of Sciences, University College London, Beijing University of Posts and Telecommunications, Tsinghua University and Northeastern University and Princeton University. Besides, top frequently adopted keywords with strongest citation burst are Genetic Algorithm, Dynamic Programming, Q-Learning, Mobile Robot, Wireless Sensor Network, Smart Grid, Big Data, Inverse Reinforcement Learning and Cognitive Radio, which demonstrate the emerging trends of this field. We claim the results shown in this paper provide a dynamic view of the evolution of “Reinforcement Learning” research landscapes and trends from various perspectives that is able to serve as a potential future research guide, and the way we demonstrate could also be adopted to analyze other research topics.

L. Zeng and X. Yin—Contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction (1988)

    Google Scholar 

  2. Selvaraju, R.R., Das, A., Vedantam, R., Cogswell, M., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 128(2), 336–359 (2020)

    Article  Google Scholar 

  3. Dai, H., Khalil, E.B., Zhang, Y., Dilkina, B., Song, L.: Learning combinatorial optimization algorithms over graphs. In: NIPS 2017 Proceedings of the 31st International Conference on Neural Information Processing Systems, vol. 30, pp. 6351–6361 (2017)

    Google Scholar 

  4. Luong, N.C., et al.: Applications of deep reinforcement learning in communications and networking: a survey. IEEE Commun. Surv. Tutor. 21(4), 3133–3174 (2019)

    Article  Google Scholar 

  5. Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(1), 1334–1373 (2016)

    MathSciNet  MATH  Google Scholar 

  6. Kober, J., Andrew Bagnell, J., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32(11), 1238–1274 (2013)

    Article  Google Scholar 

  7. Abdulhai, B., Pringle, R., Karakoulas, G.J.: Reinforcement learning for true adaptive traffic signal control. J. Transp. Eng.-ASCE 129(3), 278–285 (2003)

    Article  Google Scholar 

  8. Xiong, R., Cao, J., Yu, Q.: Reinforcement learning-based real-time power management for hybrid energy storage system in the plug-in hybrid electric vehicle. Appl. Energy 211, 538–548 (2018)

    Article  Google Scholar 

  9. Olivecrona, M., Blaschke, T., Engkvist, O., Chen, H.: Molecular de-Novo design through deep reinforcement learning. J. Cheminformatics 9(1), 48 (2017)

    Article  Google Scholar 

  10. Zhou, Z., Li, X., Zare, R.N.: Optimizing chemical reactions with deep reinforcement learning. ACS Cent. Sci. 3(12), 1337–1344 (2017)

    Article  Google Scholar 

  11. Fan, C., Zeng, L., Sun, Y., Liu, Y.-Y.: Finding key players in complex networks through deep reinforcement learning. Nat. Mach. Intell. 2, 317–324 (2020)

    Article  Google Scholar 

  12. Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4(1), 237–285 (1996)

    Article  Google Scholar 

  13. Busoniu, L., Babuska, R., De. Schutter, B.: A comprehensive survey of multiagent reinforcement learning. Syst. Man Cybern. 38(2), 156–172 (2008)

    Article  Google Scholar 

  14. Pritchard, A.: Statistical bibliography or bibliometrics. J. Documentation 25, 348 (1969)

    Google Scholar 

  15. Fan, C., Zeng, L., Ding, Y., Chen, M., Sun, Y., Liu, Z.: Learning to identify high betweenness centrality nodes from scratch: a novel graph neural network approach. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 559–568 (2019)

    Google Scholar 

  16. Garfield, E.: Citation indexes for science: a new dimension in documentation through association of ideas. Science 122(3159), 108–111 (1955)

    Article  Google Scholar 

  17. Chen, C.: CiteSpace II: detecting and visualizing emerging trends and transient patterns in scientific literature. J. Am. Soc. Inf. Sci. 57(3), 359–377 (2006)

    Article  Google Scholar 

  18. van Eck, N.J., Waltman, L.: Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics 84(2), 523–538 (2010)

    Article  Google Scholar 

  19. Kleinberg, J.: Bursty and hierarchical structure in streams. Data Min. Knowl. Disc. 7(4), 373–397 (2003)

    Article  MathSciNet  Google Scholar 

  20. Egghe, L.: Theory and practise of the g-index. Scientometrics 69(1), 131–152 (2006)

    Article  MathSciNet  Google Scholar 

  21. Schvaneveldt, R.W.: Pathfinder Associative Networks: Studies in Knowledge Organization (1990)

    Google Scholar 

  22. Brandes, U.: A faster algorithm for betweenness centrality. J. Math. Sociol. 25(2), 163–177 (2001)

    Article  MATH  Google Scholar 

  23. Aryadoust, S.V., Tan, H.A.H., Ng, L.Y.: A scientometric review of Rasch measurement: the rise and progress of a specialty. Front. Psychol. 10, 2197 (2019)

    Article  Google Scholar 

  24. Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. Adv. Neural. Inf. Process. Syst. 14, 849–856 (2001)

    Google Scholar 

  25. Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20(1), 53–65 (1987)

    Article  MATH  Google Scholar 

  26. Lewis, F.L., Vrabie, D., Vamvoudakis, K.G.: Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers. IEEE Control Syst. Mag. 32(6), 76–105 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  27. Liu, D., Wei, Q.: Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Trans. Neural Netw. 25(3), 621–634 (2014)

    Article  Google Scholar 

  28. Bhasin, S., Kamalapurkar, R., Johnson, M., Vamvoudakis, K.G., Lewis, F.L., Dixon, W.E.: A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 49(1), 82–92 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  29. Karimpour, A., Naghibi-Sistani, M.B.: Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50(4), 1167–1175 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  30. Vamvoudakis, K.G., Lewis, F.L., Hudas, G.R.: Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality. Automatica 48(8), 1598–1611 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  31. Mahadevan, S., Connell, J.: Automatic programming of behavior-based robots using reinforcement learning. Artif. Intell. 55(2), 311–365 (1992)

    Article  Google Scholar 

  32. Tsitsiklis, J.N., Van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Trans. Autom. Control 42(5), 674–690 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  33. Jaakkola, T., Jordan, M.I., Singh, S.P.: On the convergence of stochastic iterative dynamic programming algorithms. Neural Comput. 6, 1185–1201 (1994)

    Article  MATH  Google Scholar 

  34. Asada, M., Noda, S., Tawaratsumida, S., Hosoda, K.: Purposive behavior acquisition for a real robot by vision-based reinforcement learning. Mach. Learn. 23(2), 279–303 (1996)

    Google Scholar 

  35. Tsitsiklis, J.N., van Roy, B.: Feature-based methods for large scale dynamic programming. Mach. Learn. 22(1), 59–94 (1996)

    MATH  Google Scholar 

  36. Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 3389–3396 (2017)

    Google Scholar 

  37. Todorov, E., Erez, T., Tassa, Y.: MuJoCo: a physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012)

    Google Scholar 

  38. Zhu, Y., Mak, B.: Speeding up softmax computations in DNN-based large vocabulary speech recognition by senone weight vector selection. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5335–5339 (2017)

    Google Scholar 

  39. Levine, S., Pastor, P., Krizhevsky, A., Ibarz, J., Quillen, D.: Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. Int. J. Robot. Res. 37(4–5), 421–436 (2017)

    Google Scholar 

  40. Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 23–30 (2017)

    Google Scholar 

  41. Wu, J., He, H., Peng, J., Li, Y., Li, Z.: Continuous reinforcement learning of energy management with deep Q network for a power split hybrid electric bus. Appl. Energy 222, 799–811 (2018)

    Article  Google Scholar 

  42. Li, L., Lv, Y., Wang, F.-Y.: Traffic signal timing via deep reinforcement learning. IEEE/CAA J. Automatica Sinica 3(3), 247–254 (2016)

    Article  MathSciNet  Google Scholar 

  43. Mocanu, E., et al.: On-line building energy optimization using deep reinforcement learning. IEEE Trans. Smart Grid 10(4), 3698–3708 (2019)

    Article  Google Scholar 

  44. El-Tantawy, S., Abdulhai, B., Abdelgawad, H.: Multiagent reinforcement learning for integrated network of adaptive traffic signal controllers (MARLIN-ATSC): methodology and large-scale application on downtown Toronto. IEEE Trans. Intell. Transp. Syst. 14(3), 1140–1150 (2013)

    Article  Google Scholar 

  45. Ruelens, F., Claessens, B.J., Vandael, S., De Schutter, B., Babuska, R., Belmans, R.: Residential demand response of thermostatically controlled loads using batch reinforcement learning. IEEE Trans. Smart Grid 8(5), 2149–2159 (2017)

    Article  Google Scholar 

  46. Lewis, F.L., Vrabie, D.: Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 9(3), 32–50 (2009)

    Article  Google Scholar 

  47. Al-Tamimi, A., Lewis, F.L., Abu-Khalaf, M.: Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. Syst. Man Cybern. 38(4), 943–949 (2008)

    Article  Google Scholar 

  48. Vamvoudakis, K.G., Lewis, F.L.: Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5), 878–888 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  49. Zhang, H., Cui, L., Zhang, X., Luo, Y.: Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Trans. Neural Netw. 22(12), 2226–2236 (2011)

    Article  Google Scholar 

  50. Modares, H., Lewis, F.L.: Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning. IEEE Trans. Autom. Control 59(11), 3051–3056 (2014)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zeng, L., Yin, X., Li, Y., Li, Z. (2021). Exploring the Landscapes and Emerging Trends of Reinforcement Learning from 1990 to 2020: A Bibliometric Analysis. In: Tan, Y., Shi, Y. (eds) Advances in Swarm Intelligence. ICSI 2021. Lecture Notes in Computer Science(), vol 12690. Springer, Cham. https://doi.org/10.1007/978-3-030-78811-7_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-78811-7_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-78810-0

  • Online ISBN: 978-3-030-78811-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics