Parallel Random Embedding with Negatively Correlated Search

Yang, Qi; Yang, Peng; Tang, Ke

doi:10.1007/978-3-030-78811-7_33

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12690))

Included in the following conference series:

International Conference on Swarm Intelligence

1206 Accesses
2 Citations

Abstract

Evolutionary algorithm (EA) is proved to be a promising way for parameter optimization in deep reinforcement learning (RL) in recent years. However, it still suffers from the curse of dimensionality when dealing with high-dimensional inputs. Based on experiments, we observe that only a few variables contribute significantly to the performance of large-scale RL policy. Intuitively, we propose a parallel random embedding framework to optimize strategies on multiple parameter sub-spaces to incorporate classical Evolution algorithms and techniques for the million-scale RL policy optimization. Experiments show that our approach has outperforming performance with Negatively Correlation Search (NCS) in the framework.

This work is supported by the Natural Science Foundation of China (Grant No. 61806090 and Grant No. 61672478), Guangdong Provincial Key Laboratory (Grant No. 2020B121201001), the Program for Guangdong Introducing Innovative and Entrepreneurial Teams (Grant No. 2017ZT07X386), Shenzhen Science and Technology Program (Grant No. KQTD2016112514355531), the Program for University Key Laboratory of Guangdong Province (Grant No. 2017KSYS008).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Al-Dujaili, A., Suresh, S.: Embedded bandits for large-scale black-box optimization. In: Singh, S.P., Markovitch, S. (eds.) Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4–9, 2017, San Francisco, California, USA. pp. 758–764. AAAI Press, New York (2017)
Google Scholar
Binois, M., Ginsbourger, D., Roustant, O.: On the choice of the low-dimensional domain for global optimization via random embeddings. J. Global Optim. 76(1), 69–90 (2019). https://doi.org/10.1007/s10898-019-00839-1
Article MathSciNet MATH Google Scholar
Carpentier, A., Munos, R.: Bandit theory meets compressed sensing for high dimensional stochastic linear bandit. Proc. Mach. Learn. Res. 22, 190–198 (2012)
Google Scholar
Chrabaszcz, P., Loshchilov, I., Hutter, F.: Back to basics: Benchmarking canonical evolution strategies for playing atari. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, pp. 1419–1426 (2018)
Google Scholar
Conti, E., Madhavan, V., Such, F.P., Lehman, J., Stanley, K.O., Clune, J.: Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents. In: Advances in Neural Information Processing Systems 31: NeurIPS 2018, December 3–8, 2018, Montreal, Canada. pp. 5032–5043 (2018)
Google Scholar
Kaban, A., Bootkrajang, J., Durrant, R.J.: Towards large scale continuous EDA: a random matrix theory perspective. In: Proceeding of the Fifteenth Annual Conference on Genetic and Evolutionary Computation Conference, GECCO 2013, p. 383. ACM Press, New York (2013)
Google Scholar
Kakade, S.M.: A natural policy gradient. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems 14 [Neural Information Processing Systems: Natural and Synthetic, NIPS 2001, December 3–8, 2001, Vancouver, British Columbia, Canada], pp. 1531–1538. MIT Press, Cambridge, MA (2001)
Google Scholar
Knight, J.N., Lunacek, M.: Reducing the space-time complexity of the CMA-ES. In: Lipson, H. (ed.) Genetic and Evolutionary Computation Conference, GECCO 2007, Proceedings, London, England, UK, July 7–11, 2007, pp. 658–665. ACM Press, New York (2007)
Google Scholar
Loshchilov, I.: A computationally efficient limited memory CMA-ES for large scale optimization. In: Arnold, D.V. (ed.) Genetic and Evolutionary Computation Conference, pp. 397–404. ACM Press, New York (2014)
Google Scholar
Ma, X., et al.: A survey on cooperative co-evolutionary algorithms. IEEE Trans. Evol. Comput. 23(3), 421–441 (2019)
Article Google Scholar
Machado, M.C., Bellemare, M.G., et al.: Revisiting the arcade learning environment: evaluation protocols and open problems for general agents. J. Artif. Intell. Res. 61(1), 523–562 (2018)
Article MathSciNet Google Scholar
Müller, N., Glasmachers, T.: Challenges in high-dimensional reinforcement learning with evolution strategies. In: Auger, A., Fonseca, C.M., Lourenço, N., Machado, P., Paquete, L., Whitley, D. (eds.) PPSN 2018. LNCS, vol. 11102, pp. 411–423. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99259-4_33
Chapter Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Mnih, V., Badia, A.P., et al.: Asynchronous methods for deep reinforcement learning. Proc. Mach. Learn. Res. 48, 1928–1937 (2016)
Google Scholar
Mousavi, S.S., Schukat, M., Howley, E.: Deep reinforcement learning: an overview. In: Bi, Y., Kapoor, S., Bhatia, R. (eds.) IntelliSys 2016. LNNS, vol. 16, pp. 426–440. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-56991-8_32
Chapter Google Scholar
Qian, H., Hu, Y.Q., Yu, Y.: Derivative-free optimization of high-dimensional non-convex functions by sequential random embeddings. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, pp. 1946–1952. AAAI Press, New York (2016)
Google Scholar
Qian, H., Yu, Y.: Scaling simultaneous optimistic optimization for high-dimensional non-convex functions with low effective dimensions. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI 2016, pp. 2000–2006. AAAI Press, New York (2016)
Google Scholar
Qian, H., Yu, Y.: Solving high-dimensional multi-objective optimization problems with low effective dimensions. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI 2017, pp. 875–881. AAAI Press, New York (2017)
Google Scholar
Salimans, T., Ho, J., et al.: Evolution strategies as a scalable alternative to reinforcement learning. arXiv:1703.03864 (2017)
Sanyang, M.L., Kabán, A.: REMEDA: random embedding EDA for optimising functions with intrinsic dimension. In: Handl, J., Hart, E., Lewis, P.R., López-Ibáñez, M., Ochoa, G., Paechter, B. (eds.) PPSN 2016. LNCS, vol. 9921, pp. 859–868. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45823-6_80
Chapter Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv:1707.06347 (2017)
Such, F.P., Madhavan, V., et al.: Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv:1712.06567 (2018)
Tang, K., Yang, P., Yao, X.: Negatively correlated search. IEEE J. Sel. Areas Commun. 34(3), 542–550 (2016)
Article Google Scholar
Wang, Z., Zoghi, M., et al.: Bayesian optimization in high dimensions via random embeddings. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, IJCAI 2013, pp. 1778–1784. AAAI Press (2013)
Google Scholar
Yang, P., Tang, K., Yao, X.: A parallel divide-and-conquer-based evolutionary algorithm for large-scale optimization. IEEE Access 7, 163105–163118 (2019)
Article Google Scholar
Yang, Z., Tang, K., Yao, X.: Large scale evolutionary optimization using cooperative coevolution. Inf. Sci. 178(15), 2985–2999 (2008)
Article MathSciNet Google Scholar
Zhang, L., Mahdavi, M., Jin, R., Yang, T., Zhu, S.: Recovering the optimal solution by dual random projection. In: Proceedings of the 26th Annual Conference on Learning Theory, vol. 30, pp. 135–157 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Southern University of Science and Technology, Shenzhen, China
Qi Yang, Peng Yang & Ke Tang

Authors

Qi Yang
View author publications
You can also search for this author in PubMed Google Scholar
Peng Yang
View author publications
You can also search for this author in PubMed Google Scholar
Ke Tang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ke Tang .

Editor information

Editors and Affiliations

Peking University, Beijing, China
Ying Tan
Southern University of Science and Technology, Shenzhen, China
Yuhui Shi

Appendix

Hyper-parameters. In NCSRE, the populations numbers is set to 7 and the population size of each sub-process is set to 2. For a network with 1.7 million parameters, we set embedding dimension to 100 considering the computational resource limitation and optimization effect. Generally speaking, algorithms 1000-dimensions effective space can achieve better performance if matrix production operation accelerates techniques are adopted. For NCS-C, the learning rate of sigma and adaptive negatively correlated coefficient are set the same as its origin paper, and the initial value of sigma is searched by grid method and set to 0.01 when the bound is [–0.1,0.1]. The bound of parameters defined our search space and should be carefully confirmed in different RL problems. We determined the bound of D-dimension space according to existing research and most parameters in well-performance network lie on [–10,10]. In our experiment, alpha is set to a constant for convenience. Highlightly, the length of the phase and epoch are corresponding to each other to match the sub-process and random embedding framework. In NCS-C, each sigma is adapted for every epoch iterations and epoch are usually set to 5 or 10. As for \(L_{phase}\), there are two-edges effects that too large may lead to less embedding spaces, but too short phase leads to insufficient optimization in each subspace. Considering these factors, we set to 30 and 5. What’s more, each network is re-evaluated with t times to reduce noise in games and take average scores as fitness. The initial value of t is set to 3 and is adaptively increased to 6 as \(3*\frac{t}{t_{max}}\) (Table 2).

Implemention. The implement code was published on Github https://github.com/Desein-Yang/NCS-RL.

Table 2. Recommended hyper-parameters of NCSRE in our experiments.

Full size table

Network Architecture. The parameters about the policy network architecture in our experiments are listed in Table 3.

Table 3. The network architecture of RL policy

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, Q., Yang, P., Tang, K. (2021). Parallel Random Embedding with Negatively Correlated Search. In: Tan, Y., Shi, Y. (eds) Advances in Swarm Intelligence. ICSI 2021. Lecture Notes in Computer Science(), vol 12690. Springer, Cham. https://doi.org/10.1007/978-3-030-78811-7_33

Download citation

DOI: https://doi.org/10.1007/978-3-030-78811-7_33
Published: 07 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-78810-0
Online ISBN: 978-3-030-78811-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Parallel Random Embedding with Negatively Correlated Search

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation