Abstract
Recently, theory of collective intelligence (CI) evolution is proposed as a meta algorithm toward artificial general intelligence. But the only implementation of the CI algorithm of the theory is the Monte Carlo tree search (MCTS) used by AlphaZero. Since ant colony optimization (ACO) is an extensively used CI algorithm, it is useful to implement CI evolution using ACO. A genetic version of ACO is adapted to satisfy the CI evolution theory by two methods. One method is realized by using a policy network, namely policy network guided ACO (P-ACO). The other method is realized by using a policy network and a value network, namely policy and value network guided ACO (PV-ACO). Both methods of ACO evolution algorithm are applied to Tic-Tac-Toe and Four in a Row, where traditional ACO played poorly compared to the tree search algorithm, e.g., MCTS. Computational experiments are done to compare both methods with pure ACO and MCTS. As a result, the intelligence level of ACO evolution algorithm quickly exceeds pure ACO and MCTS. In this article, the performance of ACO evolution algorithm is analyzed and the feasibility of applying the CI evolution theory to a specific application is verified.







Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Abdelaziz A, Salama AS, Riad A (2019) A swarm intelligence model for enhancing health care services in smart cities applications. In: Security in smart cities: models, applications, and challenges. Springer, pp 71–91
Abed-alguni B, Paul D, Chalup S, Henskens F (2016) A comparison study of cooperative q-learning algorithms for independent learners. Int J Artif Intell 14(1):71–93
Acan A (2004) An external memory implementation in ant colony optimization. In: Ant colony optimization and swarm intelligence. Springer, Berlin, pp 73–82. https://doi.org/10.1007/978-3-540-28646-2_7
Acan A (2005) An external partial permutations memory for ant colony optimization. In: Evolutionary computation in combinatorial optimization. Springer, Berlin, pp 1–11. https://doi.org/10.1007/978-3-540-31996-2_1
Ahmadabadi MN, Imanipour A, Araabi BN, Asadpour M, Siegwart R (2006) Knowledge-based extraction of area of expertise for cooperation in learning. In: 2006 IEEE/RSJ international conference on intelligent robots and systems, pp 3700–3705. https://doi.org/10.1109/IROS.2006.281730
Birattari M, Di Caro G, Dorigo M (2002) Toward the formal foundation of ant programming. In: Dorigo M, Di Caro G, Sampels M (eds) Ant algorithms. Springer, Berlin, pp 188–201. https://doi.org/10.1007/3-540-45724-0_16
Busoniu L, Babuska R, De Schutter B (2008) A comprehensive survey of multiagent reinforcement learning. IEEE Trans Syst Man Cybern Part C (Appl Rev) 38(2):156–172. https://doi.org/10.1109/TSMCC.2007.913919
Conforth M, Meng Y (2008) Reinforcement learning for neural networks using swarm intelligence. In: 2008 IEEE swarm intelligence symposium, pp 1–7. https://doi.org/10.1109/SIS.2008.4668289
Coulom R (2008) Whole-history rating: a Bayesian rating system for players of time-varying strength. In: International conference on computers and games
Dorigo M, Birattari M, Stutzle T (2006) Ant colony optimization. IEEE Comput Intell Mag 1(4):28–39. https://doi.org/10.1109/mci.2006.329691
Dorigo M, Gambardella L (1997) Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Trans Evol Comput 1(1):53–66. https://doi.org/10.1109/4235.585892
Dorigo M, Maniezzo V, Colorni A (1996) Ant system: optimization by a colony of cooperating agents. IEEE Trans Syst Man Cybern Part B Cybern 26(1):29–41. https://doi.org/10.1109/3477.484436
Dorigo M, Stützle T (2003) The ant colony optimization metaheuristic: algorithms, applications, and advances. In: Handbook of metaheuristics. Springer, pp 250–285. https://doi.org/10.1007/0-306-48056-5_9
Dorigo M, Stützle T (2018) Ant colony optimization: overview and recent advances. In: Handbook of metaheuristics. Springer International Publishing, pp 311–351. https://doi.org/10.1007/978-3-319-91086-4_10
Galindo-Serrano A, Giupponi L, Blasco P, Dohler M (2010) Learning from experts in cognitive radio networks: the docitive paradigm. In: 2010 Proceedings of the fifth international conference on cognitive radio oriented wireless networks and communications, pp 1–6. https://doi.org/10.4108/ICST.CROWNCOM2010.9173
Gao K, Zhang Y, Sadollah A, Su R (2016) Optimizing urban traffic light scheduling problem using harmony search with ensemble of local search. Appl Soft Comput 48:359–372. https://doi.org/10.1016/j.asoc.2016.07.029
García-Nieto J, Alba E, Carolina Olivera A (2012) Swarm intelligence for traffic light scheduling: application to real Urban areas. Eng Appl Artif Intell 25(2):274–283. https://doi.org/10.1016/j.engappai.2011.04.011
Iima H, Kuroe Y, Emoto K (2011) Swarm reinforcement learning methods for problems with continuous state-action space. In: 2011 IEEE international conference on systems, man, and cybernetics, pp 2173–2180. https://doi.org/10.1109/ICSMC.2011.6083999
Iima H, Kuroe Y, Matsuda S (2010) Swarm reinforcement learning method based on ant colony optimization. In: 2010 IEEE international conference on systems, man and cybernetics, pp 1726–1733. https://doi.org/10.1109/ICSMC.2010.5642307
Juang C, Lu C (2009) Ant colony optimization incorporated with fuzzy q-learning for reinforcement fuzzy control. IEEE Trans Syst Man Cybern Part A Syst Hum 39(3):597–608. https://doi.org/10.1109/TSMCA.2009.2014539
Kang F, Li J, Dai J (2019) Prediction of long-term temperature effect in structural health monitoring of concrete dams using support vector machines with Jaya optimizer and salp swarm algorithms. Adv Eng Softw 131:60–76. https://doi.org/10.1016/j.advengsoft.2019.03.003
Marquez JM, Ortega JA, Gonzalez-Abril L, Velasco F (2008) Creating adaptive learning paths using ant colony optimization and Bayesian networks. In: 2008 IEEE international joint conference on neural networks (IEEE World Congress on Computational Intelligence). IEEE. https://doi.org/10.1109/ijcnn.2008.4634349
Matta M, Cardarilli GC, Di Nunzio L, Fazzolari R, Giardino D, Re M, Silvestri F, Spanò S (2019) Q-RTS: A real-time swarm intelligence based on multi-agent q-learning. Electron Lett. https://doi.org/10.1049/el.2019.0244
Pham QV, Nguyen DC, Mirjalili S, Hoang DT, Nguyen DN, Pathirana PN, Hwang WJ (2020) Swarm intelligence for next-generation wireless networks: recent advances and applications. arXiv:2007.15221
Qi X, Liu C, Fu C, Gan Z (2018) Theory of collective intelligence evolution and its applications in intelligent robots. Chin J Eng Sci 20(4):101. https://doi.org/10.15302/j-sscae-2018.04.017
Shi Z, Tu J, Li Y, Wang Z (2013) Adaptive reinforcement q-learning algorithm for swarm-robot system using pheromone mechanism. In: 2013 IEEE international conference on robotics and biomimetics (ROBIO), pp 952–957. https://doi.org/10.1109/ROBIO.2013.6739586
Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T, Lillicrap T, Simonyan K, Hassabis D (2018) A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419):1140–1144. https://doi.org/10.1126/science.aar6404
Socha K, Blum C (2007) An ant colony optimization algorithm for continuous optimization: application to feed-forward neural network training. Neural Comput Appl 16(3):235–247. https://doi.org/10.1007/s00521-007-0084-z
Stützle T, Hoos HH (2000) MAX-MIN ant system. Future Gener Comput Syst 16(8):889–914. https://doi.org/10.1016/s0167-739x(00)00043-1
Tsutsui S (2006) cAS: ant colony optimization with cunning ants. In: Parallel problem solving from nature—PPSN IX. Springer, Berlin, pp 162–171. https://doi.org/10.1007/11844297_17
Wiesemann W, Stützle T (2006) Iterated ants: an experimental study for the quadratic assignment problem. In: Ant colony optimization and swarm intelligence. Springer, Berlin, pp 179–190. https://doi.org/10.1007/11839088_16
Yuan Z, Fügenschuh A, Homfeld H, Balaprakash P, Stützle T, Schoch M (2008) Iterated greedy algorithms for a real-world cyclic train scheduling problem. In: Hybrid metaheuristics. Springer, Berlin, pp 102–116. https://doi.org/10.1007/978-3-540-88439-2_8
Zedadra O, Guerrieri A, Jouandeau N, Spezzano G, Seridi H, Fortino G (2018) Swarm intelligence-based algorithms within IoT-based systems: a review. J Parallel Distrib Comput 122:173–187
Acknowledgements
The authors would like to thank Ji Hua Laboratory for the support of this paper (project ID X190021TB190). This study was also supported by Shanghai Engineering Research Center of AI and Robotics, Fudan University, China and Engineering Research Center of AI and Robotics, Ministry of Education, China.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Neural network architecture
The neural network architecture is a simplified version compared to that of AlphaZero. The input to the neural network is an \(n\times n \times 4\) image stack comprising 4 binary feature planes, where n denotes the size of the board. The first two feature planes: \(X_t\) and \(Y_t\), consist of binary values indicating the presence of current player’s pieces and opponent’s pieces respectively, where \(X_t^i = 1\) if intersection i contains a stone of current player at time step t, and \(Y_t^i = 1\) if intersection i contains a stone of opponent player; \(X_t^i, Y_t^i = 0\) if the intersection i is empty. The third feature plane is the latest move of opponent, the value of position corresponding to opponent’s latest move is 1 and others are 0. The final feature plane, C, denotes the color of the player who is to play, and has a constant value of either 1 if black is to play or 0 if white is to play. These planes are concatenated together to provide input features \(s_t=[X_t, Y_t, M_{t-1} ,C] \). History feature of the opponent’s latest move \(M_{t-1}\) and the color feature C are unnecessary in the games like Four in a Row, because the current board has perfect information for determining optimal policy and value function. Nonetheless, it does make sense that current player tends to place next move around the latest move position of its opponent, which denotes that history feature \(M_{t-1}\) can be instructive for current player. Similarly, the color feature C contains information whether current player is on the offensive and tends to win.
The input features \(s_t\) are proceeded by a public convolutional block followed by two separated fully connected networks. The public convolutional block uses the following modules:
-
(1)
A convolution of 32 filters of kernel size \(3\times 3\) with stride 1 and padding 1, which is activated by a rectifier nonlinearity;
-
(2)
A convolution of 64 filters of kernel size \(3\times 3\) with stride 1 and padding 1, which is activated by a rectifier nonlinearity;
-
(3)
A convolution of 128 filters of kernel size \(3\times 3\) with stride 1 and padding 1, which is activated by a rectifier nonlinearity.
The output of the public convolutional block is sent to two separate heads for computing the policy and value. The policy head applies the following modules:
-
(1)
A convolution of four filters of kernel size \(1\times 1\) with stride 1, which is activated by a rectifier nonlinearity;
-
(2)
A fully connected linear layer that outputs a vector with size \(n^2\);
-
(3)
A soft-max nonlinear function outputting possibilities of all positions, where the illegal moves are manually removed during playing and the probabilities are re-normalized.
The value head applies the following modules:
-
(1)
A convolution of 2 filters of kernel size \(1\times 1\) with stride 1, which is activated by a rectifier nonlinearity;
-
(2)
A fully connected linear layer of size 64, which is activated by a rectifier nonlinearity;
-
(3)
A fully connected linear layer to a scalar;
-
(4)
A tanh nonlinearity outputting a scalar in the range \([-1,1]\).
The overall network has a depth of 5 or 6, consisting of 3 public convolutional layers, plus 2 layers for the policy head and 3 layers for the value head. With this simplified network, training and predicting are relatively fast.
1.2 Sensitivity of the ant number
Since the number of ants is a critical parameter for the conventional ACO, here we perform simple experiments to analyze the sensitivity of the proposed algorithm. The experiments are performed by using a toy navigation task—grid-based path planning. The conventional ACO and the policy neural network guided ACO have been tested on a map of \(10\times 10\) grids with randomly generated start and target states. The average lengths of the trajectories over 20 trials are presented in Fig. 8. By comparing the two algorithms, it is clearly shown that for the conventional ACO, a greater number of ants leads to a quicker convergence. For the proposed method, apparently, there are differences between the one-ant case and the others in Fig. 8b. However, by comparing the length of cases for 2, 4 and 8 ants, the whole iteration number is not sensitive to the number of ants. It could be concluded that more ants (\(>2\)) may not improve the performance significantly but only increase the total amount of computation, which shows we do not need as many ants as the conventional ACO. The reason is that the conventional ACO uses more ants to explore the feasible space to avoid being trapped by a local optimum, while in our method, the exploration can also be driven by the stochasticity of the neural network output, and thus fewer ants are needed through the iterations of searching in the planning space. In other words, the solution is searched by many ants (for conventional ACO) or by fewer ants recursively iterated again and again.
Rights and permissions
About this article
Cite this article
Qi, X., Gan, Z., Liu, C. et al. Collective intelligence evolution using ant colony optimization and neural networks. Neural Comput & Applic 33, 12721–12735 (2021). https://doi.org/10.1007/s00521-021-05918-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-05918-7