Skip to main content

Parallel Random Embedding with Negatively Correlated Search

  • Conference paper
  • First Online:
Advances in Swarm Intelligence (ICSI 2021)

Abstract

Evolutionary algorithm (EA) is proved to be a promising way for parameter optimization in deep reinforcement learning (RL) in recent years. However, it still suffers from the curse of dimensionality when dealing with high-dimensional inputs. Based on experiments, we observe that only a few variables contribute significantly to the performance of large-scale RL policy. Intuitively, we propose a parallel random embedding framework to optimize strategies on multiple parameter sub-spaces to incorporate classical Evolution algorithms and techniques for the million-scale RL policy optimization. Experiments show that our approach has outperforming performance with Negatively Correlation Search (NCS) in the framework.

This work is supported by the Natural Science Foundation of China (Grant No. 61806090 and Grant No. 61672478), Guangdong Provincial Key Laboratory (Grant No. 2020B121201001), the Program for Guangdong Introducing Innovative and Entrepreneurial Teams (Grant No. 2017ZT07X386), Shenzhen Science and Technology Program (Grant No. KQTD2016112514355531), the Program for University Key Laboratory of Guangdong Province (Grant No. 2017KSYS008).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Al-Dujaili, A., Suresh, S.: Embedded bandits for large-scale black-box optimization. In: Singh, S.P., Markovitch, S. (eds.) Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4–9, 2017, San Francisco, California, USA. pp. 758–764. AAAI Press, New York (2017)

    Google Scholar 

  2. Binois, M., Ginsbourger, D., Roustant, O.: On the choice of the low-dimensional domain for global optimization via random embeddings. J. Global Optim. 76(1), 69–90 (2019). https://doi.org/10.1007/s10898-019-00839-1

    Article  MathSciNet  MATH  Google Scholar 

  3. Carpentier, A., Munos, R.: Bandit theory meets compressed sensing for high dimensional stochastic linear bandit. Proc. Mach. Learn. Res. 22, 190–198 (2012)

    Google Scholar 

  4. Chrabaszcz, P., Loshchilov, I., Hutter, F.: Back to basics: Benchmarking canonical evolution strategies for playing atari. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, pp. 1419–1426 (2018)

    Google Scholar 

  5. Conti, E., Madhavan, V., Such, F.P., Lehman, J., Stanley, K.O., Clune, J.: Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents. In: Advances in Neural Information Processing Systems 31: NeurIPS 2018, December 3–8, 2018, Montreal, Canada. pp. 5032–5043 (2018)

    Google Scholar 

  6. Kaban, A., Bootkrajang, J., Durrant, R.J.: Towards large scale continuous EDA: a random matrix theory perspective. In: Proceeding of the Fifteenth Annual Conference on Genetic and Evolutionary Computation Conference, GECCO 2013, p. 383. ACM Press, New York (2013)

    Google Scholar 

  7. Kakade, S.M.: A natural policy gradient. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems 14 [Neural Information Processing Systems: Natural and Synthetic, NIPS 2001, December 3–8, 2001, Vancouver, British Columbia, Canada], pp. 1531–1538. MIT Press, Cambridge, MA (2001)

    Google Scholar 

  8. Knight, J.N., Lunacek, M.: Reducing the space-time complexity of the CMA-ES. In: Lipson, H. (ed.) Genetic and Evolutionary Computation Conference, GECCO 2007, Proceedings, London, England, UK, July 7–11, 2007, pp. 658–665. ACM Press, New York (2007)

    Google Scholar 

  9. Loshchilov, I.: A computationally efficient limited memory CMA-ES for large scale optimization. In: Arnold, D.V. (ed.) Genetic and Evolutionary Computation Conference, pp. 397–404. ACM Press, New York (2014)

    Google Scholar 

  10. Ma, X., et al.: A survey on cooperative co-evolutionary algorithms. IEEE Trans. Evol. Comput. 23(3), 421–441 (2019)

    Article  Google Scholar 

  11. Machado, M.C., Bellemare, M.G., et al.: Revisiting the arcade learning environment: evaluation protocols and open problems for general agents. J. Artif. Intell. Res. 61(1), 523–562 (2018)

    Article  MathSciNet  Google Scholar 

  12. Müller, N., Glasmachers, T.: Challenges in high-dimensional reinforcement learning with evolution strategies. In: Auger, A., Fonseca, C.M., Lourenço, N., Machado, P., Paquete, L., Whitley, D. (eds.) PPSN 2018. LNCS, vol. 11102, pp. 411–423. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99259-4_33

    Chapter  Google Scholar 

  13. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  14. Mnih, V., Badia, A.P., et al.: Asynchronous methods for deep reinforcement learning. Proc. Mach. Learn. Res. 48, 1928–1937 (2016)

    Google Scholar 

  15. Mousavi, S.S., Schukat, M., Howley, E.: Deep reinforcement learning: an overview. In: Bi, Y., Kapoor, S., Bhatia, R. (eds.) IntelliSys 2016. LNNS, vol. 16, pp. 426–440. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-56991-8_32

    Chapter  Google Scholar 

  16. Qian, H., Hu, Y.Q., Yu, Y.: Derivative-free optimization of high-dimensional non-convex functions by sequential random embeddings. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, pp. 1946–1952. AAAI Press, New York (2016)

    Google Scholar 

  17. Qian, H., Yu, Y.: Scaling simultaneous optimistic optimization for high-dimensional non-convex functions with low effective dimensions. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI 2016, pp. 2000–2006. AAAI Press, New York (2016)

    Google Scholar 

  18. Qian, H., Yu, Y.: Solving high-dimensional multi-objective optimization problems with low effective dimensions. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI 2017, pp. 875–881. AAAI Press, New York (2017)

    Google Scholar 

  19. Salimans, T., Ho, J., et al.: Evolution strategies as a scalable alternative to reinforcement learning. arXiv:1703.03864 (2017)

  20. Sanyang, M.L., Kabán, A.: REMEDA: random embedding EDA for optimising functions with intrinsic dimension. In: Handl, J., Hart, E., Lewis, P.R., López-Ibáñez, M., Ochoa, G., Paechter, B. (eds.) PPSN 2016. LNCS, vol. 9921, pp. 859–868. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45823-6_80

    Chapter  Google Scholar 

  21. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv:1707.06347 (2017)

  22. Such, F.P., Madhavan, V., et al.: Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv:1712.06567 (2018)

  23. Tang, K., Yang, P., Yao, X.: Negatively correlated search. IEEE J. Sel. Areas Commun. 34(3), 542–550 (2016)

    Article  Google Scholar 

  24. Wang, Z., Zoghi, M., et al.: Bayesian optimization in high dimensions via random embeddings. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, IJCAI 2013, pp. 1778–1784. AAAI Press (2013)

    Google Scholar 

  25. Yang, P., Tang, K., Yao, X.: A parallel divide-and-conquer-based evolutionary algorithm for large-scale optimization. IEEE Access 7, 163105–163118 (2019)

    Article  Google Scholar 

  26. Yang, Z., Tang, K., Yao, X.: Large scale evolutionary optimization using cooperative coevolution. Inf. Sci. 178(15), 2985–2999 (2008)

    Article  MathSciNet  Google Scholar 

  27. Zhang, L., Mahdavi, M., Jin, R., Yang, T., Zhu, S.: Recovering the optimal solution by dual random projection. In: Proceedings of the 26th Annual Conference on Learning Theory, vol. 30, pp. 135–157 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ke Tang .

Editor information

Editors and Affiliations

Appendix

Appendix

Hyper-parameters. In NCSRE, the populations numbers is set to 7 and the population size of each sub-process is set to 2. For a network with 1.7 million parameters, we set embedding dimension to 100 considering the computational resource limitation and optimization effect. Generally speaking, algorithms 1000-dimensions effective space can achieve better performance if matrix production operation accelerates techniques are adopted. For NCS-C, the learning rate of sigma and adaptive negatively correlated coefficient are set the same as its origin paper, and the initial value of sigma is searched by grid method and set to 0.01 when the bound is [–0.1,0.1]. The bound of parameters defined our search space and should be carefully confirmed in different RL problems. We determined the bound of D-dimension space according to existing research and most parameters in well-performance network lie on [–10,10]. In our experiment, alpha is set to a constant for convenience. Highlightly, the length of the phase and epoch are corresponding to each other to match the sub-process and random embedding framework. In NCS-C, each sigma is adapted for every epoch iterations and epoch are usually set to 5 or 10. As for \(L_{phase}\), there are two-edges effects that too large may lead to less embedding spaces, but too short phase leads to insufficient optimization in each subspace. Considering these factors, we set to 30 and 5. What’s more, each network is re-evaluated with t times to reduce noise in games and take average scores as fitness. The initial value of t is set to 3 and is adaptively increased to 6 as \(3*\frac{t}{t_{max}}\) (Table 2).

Implemention. The implement code was published on Github https://github.com/Desein-Yang/NCS-RL.

Table 2. Recommended hyper-parameters of NCSRE in our experiments.

Network Architecture. The parameters about the policy network architecture in our experiments are listed in Table 3.

Table 3. The network architecture of RL policy

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yang, Q., Yang, P., Tang, K. (2021). Parallel Random Embedding with Negatively Correlated Search. In: Tan, Y., Shi, Y. (eds) Advances in Swarm Intelligence. ICSI 2021. Lecture Notes in Computer Science(), vol 12690. Springer, Cham. https://doi.org/10.1007/978-3-030-78811-7_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-78811-7_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-78810-0

  • Online ISBN: 978-3-030-78811-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics