Collective intelligence evolution using ant colony optimization and neural networks

Qi, Xiaoya; Gan, Zhongxue; Liu, Chuang; Xu, Zheng; Zhang, Xiaozhi; Li, Wei; Ouyang, Chun

doi:10.1007/s00521-021-05918-7

Collective intelligence evolution using ant colony optimization and neural networks

Original Article
Published: 21 April 2021

Volume 33, pages 12721–12735, (2021)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Xiaoya Qi^1,2,3,4,5,
Zhongxue Gan^1,2,3,4,5,
Chuang Liu^1,2,
Zheng Xu²,
Xiaozhi Zhang^1,2,
Wei Li^1,3,4,5 &
…
Chun Ouyang^1,3,4,5

610 Accesses
6 Citations
Explore all metrics

Abstract

Recently, theory of collective intelligence (CI) evolution is proposed as a meta algorithm toward artificial general intelligence. But the only implementation of the CI algorithm of the theory is the Monte Carlo tree search (MCTS) used by AlphaZero. Since ant colony optimization (ACO) is an extensively used CI algorithm, it is useful to implement CI evolution using ACO. A genetic version of ACO is adapted to satisfy the CI evolution theory by two methods. One method is realized by using a policy network, namely policy network guided ACO (P-ACO). The other method is realized by using a policy network and a value network, namely policy and value network guided ACO (PV-ACO). Both methods of ACO evolution algorithm are applied to Tic-Tac-Toe and Four in a Row, where traditional ACO played poorly compared to the tree search algorithm, e.g., MCTS. Computational experiments are done to compare both methods with pure ACO and MCTS. As a result, the intelligence level of ACO evolution algorithm quickly exceeds pure ACO and MCTS. In this article, the performance of ACO evolution algorithm is analyzed and the feasibility of applying the CI evolution theory to a specific application is verified.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the Multiple Possible Adaptive Mechanisms of the Continuous Ant Colony Optimization

A Memetic and Adaptive Continuous Ant Colony Optimization Algorithm

Ant Colony Based Algorithms for Dynamic Optimization Problems

References

Abdelaziz A, Salama AS, Riad A (2019) A swarm intelligence model for enhancing health care services in smart cities applications. In: Security in smart cities: models, applications, and challenges. Springer, pp 71–91
Abed-alguni B, Paul D, Chalup S, Henskens F (2016) A comparison study of cooperative q-learning algorithms for independent learners. Int J Artif Intell 14(1):71–93
Google Scholar
Acan A (2004) An external memory implementation in ant colony optimization. In: Ant colony optimization and swarm intelligence. Springer, Berlin, pp 73–82. https://doi.org/10.1007/978-3-540-28646-2_7
Acan A (2005) An external partial permutations memory for ant colony optimization. In: Evolutionary computation in combinatorial optimization. Springer, Berlin, pp 1–11. https://doi.org/10.1007/978-3-540-31996-2_1
Ahmadabadi MN, Imanipour A, Araabi BN, Asadpour M, Siegwart R (2006) Knowledge-based extraction of area of expertise for cooperation in learning. In: 2006 IEEE/RSJ international conference on intelligent robots and systems, pp 3700–3705. https://doi.org/10.1109/IROS.2006.281730
Birattari M, Di Caro G, Dorigo M (2002) Toward the formal foundation of ant programming. In: Dorigo M, Di Caro G, Sampels M (eds) Ant algorithms. Springer, Berlin, pp 188–201. https://doi.org/10.1007/3-540-45724-0_16
Chapter Google Scholar
Busoniu L, Babuska R, De Schutter B (2008) A comprehensive survey of multiagent reinforcement learning. IEEE Trans Syst Man Cybern Part C (Appl Rev) 38(2):156–172. https://doi.org/10.1109/TSMCC.2007.913919
Article Google Scholar
Conforth M, Meng Y (2008) Reinforcement learning for neural networks using swarm intelligence. In: 2008 IEEE swarm intelligence symposium, pp 1–7. https://doi.org/10.1109/SIS.2008.4668289
Coulom R (2008) Whole-history rating: a Bayesian rating system for players of time-varying strength. In: International conference on computers and games
Dorigo M, Birattari M, Stutzle T (2006) Ant colony optimization. IEEE Comput Intell Mag 1(4):28–39. https://doi.org/10.1109/mci.2006.329691
Article Google Scholar
Dorigo M, Gambardella L (1997) Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Trans Evol Comput 1(1):53–66. https://doi.org/10.1109/4235.585892
Article Google Scholar
Dorigo M, Maniezzo V, Colorni A (1996) Ant system: optimization by a colony of cooperating agents. IEEE Trans Syst Man Cybern Part B Cybern 26(1):29–41. https://doi.org/10.1109/3477.484436
Article Google Scholar
Dorigo M, Stützle T (2003) The ant colony optimization metaheuristic: algorithms, applications, and advances. In: Handbook of metaheuristics. Springer, pp 250–285. https://doi.org/10.1007/0-306-48056-5_9
Dorigo M, Stützle T (2018) Ant colony optimization: overview and recent advances. In: Handbook of metaheuristics. Springer International Publishing, pp 311–351. https://doi.org/10.1007/978-3-319-91086-4_10
Galindo-Serrano A, Giupponi L, Blasco P, Dohler M (2010) Learning from experts in cognitive radio networks: the docitive paradigm. In: 2010 Proceedings of the fifth international conference on cognitive radio oriented wireless networks and communications, pp 1–6. https://doi.org/10.4108/ICST.CROWNCOM2010.9173
Gao K, Zhang Y, Sadollah A, Su R (2016) Optimizing urban traffic light scheduling problem using harmony search with ensemble of local search. Appl Soft Comput 48:359–372. https://doi.org/10.1016/j.asoc.2016.07.029
Article Google Scholar
García-Nieto J, Alba E, Carolina Olivera A (2012) Swarm intelligence for traffic light scheduling: application to real Urban areas. Eng Appl Artif Intell 25(2):274–283. https://doi.org/10.1016/j.engappai.2011.04.011
Article Google Scholar
Iima H, Kuroe Y, Emoto K (2011) Swarm reinforcement learning methods for problems with continuous state-action space. In: 2011 IEEE international conference on systems, man, and cybernetics, pp 2173–2180. https://doi.org/10.1109/ICSMC.2011.6083999
Iima H, Kuroe Y, Matsuda S (2010) Swarm reinforcement learning method based on ant colony optimization. In: 2010 IEEE international conference on systems, man and cybernetics, pp 1726–1733. https://doi.org/10.1109/ICSMC.2010.5642307
Juang C, Lu C (2009) Ant colony optimization incorporated with fuzzy q-learning for reinforcement fuzzy control. IEEE Trans Syst Man Cybern Part A Syst Hum 39(3):597–608. https://doi.org/10.1109/TSMCA.2009.2014539
Article Google Scholar
Kang F, Li J, Dai J (2019) Prediction of long-term temperature effect in structural health monitoring of concrete dams using support vector machines with Jaya optimizer and salp swarm algorithms. Adv Eng Softw 131:60–76. https://doi.org/10.1016/j.advengsoft.2019.03.003
Article Google Scholar
Marquez JM, Ortega JA, Gonzalez-Abril L, Velasco F (2008) Creating adaptive learning paths using ant colony optimization and Bayesian networks. In: 2008 IEEE international joint conference on neural networks (IEEE World Congress on Computational Intelligence). IEEE. https://doi.org/10.1109/ijcnn.2008.4634349
Matta M, Cardarilli GC, Di Nunzio L, Fazzolari R, Giardino D, Re M, Silvestri F, Spanò S (2019) Q-RTS: A real-time swarm intelligence based on multi-agent q-learning. Electron Lett. https://doi.org/10.1049/el.2019.0244
Article Google Scholar
Pham QV, Nguyen DC, Mirjalili S, Hoang DT, Nguyen DN, Pathirana PN, Hwang WJ (2020) Swarm intelligence for next-generation wireless networks: recent advances and applications. arXiv:2007.15221
Qi X, Liu C, Fu C, Gan Z (2018) Theory of collective intelligence evolution and its applications in intelligent robots. Chin J Eng Sci 20(4):101. https://doi.org/10.15302/j-sscae-2018.04.017
Article Google Scholar
Shi Z, Tu J, Li Y, Wang Z (2013) Adaptive reinforcement q-learning algorithm for swarm-robot system using pheromone mechanism. In: 2013 IEEE international conference on robotics and biomimetics (ROBIO), pp 952–957. https://doi.org/10.1109/ROBIO.2013.6739586
Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T, Lillicrap T, Simonyan K, Hassabis D (2018) A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419):1140–1144. https://doi.org/10.1126/science.aar6404
Article MathSciNet MATH Google Scholar
Socha K, Blum C (2007) An ant colony optimization algorithm for continuous optimization: application to feed-forward neural network training. Neural Comput Appl 16(3):235–247. https://doi.org/10.1007/s00521-007-0084-z
Article Google Scholar
Stützle T, Hoos HH (2000) MAX-MIN ant system. Future Gener Comput Syst 16(8):889–914. https://doi.org/10.1016/s0167-739x(00)00043-1
Article MATH Google Scholar
Tsutsui S (2006) cAS: ant colony optimization with cunning ants. In: Parallel problem solving from nature—PPSN IX. Springer, Berlin, pp 162–171. https://doi.org/10.1007/11844297_17
Wiesemann W, Stützle T (2006) Iterated ants: an experimental study for the quadratic assignment problem. In: Ant colony optimization and swarm intelligence. Springer, Berlin, pp 179–190. https://doi.org/10.1007/11839088_16
Yuan Z, Fügenschuh A, Homfeld H, Balaprakash P, Stützle T, Schoch M (2008) Iterated greedy algorithms for a real-world cyclic train scheduling problem. In: Hybrid metaheuristics. Springer, Berlin, pp 102–116. https://doi.org/10.1007/978-3-540-88439-2_8
Zedadra O, Guerrieri A, Jouandeau N, Spezzano G, Seridi H, Fortino G (2018) Swarm intelligence-based algorithms within IoT-based systems: a review. J Parallel Distrib Comput 122:173–187
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank Ji Hua Laboratory for the support of this paper (project ID X190021TB190). This study was also supported by Shanghai Engineering Research Center of AI and Robotics, Fudan University, China and Engineering Research Center of AI and Robotics, Ministry of Education, China.

Author information

Authors and Affiliations

Ji Hua Laboratory, Foshan, Guangdong Province, China
Xiaoya Qi, Zhongxue Gan, Chuang Liu, Xiaozhi Zhang, Wei Li & Chun Ouyang
Beijing Deep Singularity Technology Co., Ltd., Beijing, China
Xiaoya Qi, Zhongxue Gan, Chuang Liu, Zheng Xu & Xiaozhi Zhang
Shanghai Engineering Research Center of AI and Robotics, Shanghai, China
Xiaoya Qi, Zhongxue Gan, Wei Li & Chun Ouyang
Engineering Research Center of AI and Robotics, Ministry of Education, Shanghai, China
Xiaoya Qi, Zhongxue Gan, Wei Li & Chun Ouyang
Institute of AI and Robotics, Fudan University, Shanghai, China
Xiaoya Qi, Zhongxue Gan, Wei Li & Chun Ouyang

Authors

Xiaoya Qi
View author publications
You can also search for this author in PubMed Google Scholar
Zhongxue Gan
View author publications
You can also search for this author in PubMed Google Scholar
Chuang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Xu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaozhi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Li
View author publications
You can also search for this author in PubMed Google Scholar
Chun Ouyang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhongxue Gan.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Neural network architecture

The neural network architecture is a simplified version compared to that of AlphaZero. The input to the neural network is an \(n\times n \times 4\) image stack comprising 4 binary feature planes, where n denotes the size of the board. The first two feature planes: \(X_t\) and \(Y_t\), consist of binary values indicating the presence of current player’s pieces and opponent’s pieces respectively, where \(X_t^i = 1\) if intersection i contains a stone of current player at time step t, and \(Y_t^i = 1\) if intersection i contains a stone of opponent player; \(X_t^i, Y_t^i = 0\) if the intersection i is empty. The third feature plane is the latest move of opponent, the value of position corresponding to opponent’s latest move is 1 and others are 0. The final feature plane, C, denotes the color of the player who is to play, and has a constant value of either 1 if black is to play or 0 if white is to play. These planes are concatenated together to provide input features \(s_t=[X_t, Y_t, M_{t-1} ,C] \). History feature of the opponent’s latest move \(M_{t-1}\) and the color feature C are unnecessary in the games like Four in a Row, because the current board has perfect information for determining optimal policy and value function. Nonetheless, it does make sense that current player tends to place next move around the latest move position of its opponent, which denotes that history feature \(M_{t-1}\) can be instructive for current player. Similarly, the color feature C contains information whether current player is on the offensive and tends to win.

The input features \(s_t\) are proceeded by a public convolutional block followed by two separated fully connected networks. The public convolutional block uses the following modules:

(1)
A convolution of 32 filters of kernel size \(3\times 3\) with stride 1 and padding 1, which is activated by a rectifier nonlinearity;
(2)
A convolution of 64 filters of kernel size \(3\times 3\) with stride 1 and padding 1, which is activated by a rectifier nonlinearity;
(3)
A convolution of 128 filters of kernel size \(3\times 3\) with stride 1 and padding 1, which is activated by a rectifier nonlinearity.

The output of the public convolutional block is sent to two separate heads for computing the policy and value. The policy head applies the following modules:

(1)
A convolution of four filters of kernel size \(1\times 1\) with stride 1, which is activated by a rectifier nonlinearity;
(2)
A fully connected linear layer that outputs a vector with size \(n^2\);
(3)
A soft-max nonlinear function outputting possibilities of all positions, where the illegal moves are manually removed during playing and the probabilities are re-normalized.

The value head applies the following modules:

(1)
A convolution of 2 filters of kernel size \(1\times 1\) with stride 1, which is activated by a rectifier nonlinearity;
(2)
A fully connected linear layer of size 64, which is activated by a rectifier nonlinearity;
(3)
A fully connected linear layer to a scalar;
(4)
A tanh nonlinearity outputting a scalar in the range \([-1,1]\).

The overall network has a depth of 5 or 6, consisting of 3 public convolutional layers, plus 2 layers for the policy head and 3 layers for the value head. With this simplified network, training and predicting are relatively fast.

1.2 Sensitivity of the ant number

Since the number of ants is a critical parameter for the conventional ACO, here we perform simple experiments to analyze the sensitivity of the proposed algorithm. The experiments are performed by using a toy navigation task—grid-based path planning. The conventional ACO and the policy neural network guided ACO have been tested on a map of \(10\times 10\) grids with randomly generated start and target states. The average lengths of the trajectories over 20 trials are presented in Fig. 8. By comparing the two algorithms, it is clearly shown that for the conventional ACO, a greater number of ants leads to a quicker convergence. For the proposed method, apparently, there are differences between the one-ant case and the others in Fig. 8b. However, by comparing the length of cases for 2, 4 and 8 ants, the whole iteration number is not sensitive to the number of ants. It could be concluded that more ants (\(>2\)) may not improve the performance significantly but only increase the total amount of computation, which shows we do not need as many ants as the conventional ACO. The reason is that the conventional ACO uses more ants to explore the feasible space to avoid being trapped by a local optimum, while in our method, the exploration can also be driven by the stochasticity of the neural network output, and thus fewer ants are needed through the iterations of searching in the planning space. In other words, the solution is searched by many ants (for conventional ACO) or by fewer ants recursively iterated again and again.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qi, X., Gan, Z., Liu, C. et al. Collective intelligence evolution using ant colony optimization and neural networks. Neural Comput & Applic 33, 12721–12735 (2021). https://doi.org/10.1007/s00521-021-05918-7

Download citation

Received: 05 May 2020
Accepted: 09 March 2021
Published: 21 April 2021
Issue Date: October 2021
DOI: https://doi.org/10.1007/s00521-021-05918-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Collective intelligence evolution using ant colony optimization and neural networks

Abstract

Access this article

Similar content being viewed by others

On the Multiple Possible Adaptive Mechanisms of the Continuous Ant Colony Optimization

A Memetic and Adaptive Continuous Ant Colony Optimization Algorithm

Ant Colony Based Algorithms for Dynamic Optimization Problems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

1.1 Neural network architecture

1.2 Sensitivity of the ant number

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Collective intelligence evolution using ant colony optimization and neural networks

Abstract

Access this article

Similar content being viewed by others

On the Multiple Possible Adaptive Mechanisms of the Continuous Ant Colony Optimization

A Memetic and Adaptive Continuous Ant Colony Optimization Algorithm

Ant Colony Based Algorithms for Dynamic Optimization Problems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

1.1 Neural network architecture

1.2 Sensitivity of the ant number

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation