Skip to main content

Advertisement

Log in

Collective intelligence evolution using ant colony optimization and neural networks

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Recently, theory of collective intelligence (CI) evolution is proposed as a meta algorithm toward artificial general intelligence. But the only implementation of the CI algorithm of the theory is the Monte Carlo tree search (MCTS) used by AlphaZero. Since ant colony optimization (ACO) is an extensively used CI algorithm, it is useful to implement CI evolution using ACO. A genetic version of ACO is adapted to satisfy the CI evolution theory by two methods. One method is realized by using a policy network, namely policy network guided ACO (P-ACO). The other method is realized by using a policy network and a value network, namely policy and value network guided ACO (PV-ACO). Both methods of ACO evolution algorithm are applied to Tic-Tac-Toe and Four in a Row, where traditional ACO played poorly compared to the tree search algorithm, e.g., MCTS. Computational experiments are done to compare both methods with pure ACO and MCTS. As a result, the intelligence level of ACO evolution algorithm quickly exceeds pure ACO and MCTS. In this article, the performance of ACO evolution algorithm is analyzed and the feasibility of applying the CI evolution theory to a specific application is verified.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Abdelaziz A, Salama AS, Riad A (2019) A swarm intelligence model for enhancing health care services in smart cities applications. In: Security in smart cities: models, applications, and challenges. Springer, pp 71–91

  2. Abed-alguni B, Paul D, Chalup S, Henskens F (2016) A comparison study of cooperative q-learning algorithms for independent learners. Int J Artif Intell 14(1):71–93

    Google Scholar 

  3. Acan A (2004) An external memory implementation in ant colony optimization. In: Ant colony optimization and swarm intelligence. Springer, Berlin, pp 73–82. https://doi.org/10.1007/978-3-540-28646-2_7

  4. Acan A (2005) An external partial permutations memory for ant colony optimization. In: Evolutionary computation in combinatorial optimization. Springer, Berlin, pp 1–11. https://doi.org/10.1007/978-3-540-31996-2_1

  5. Ahmadabadi MN, Imanipour A, Araabi BN, Asadpour M, Siegwart R (2006) Knowledge-based extraction of area of expertise for cooperation in learning. In: 2006 IEEE/RSJ international conference on intelligent robots and systems, pp 3700–3705. https://doi.org/10.1109/IROS.2006.281730

  6. Birattari M, Di Caro G, Dorigo M (2002) Toward the formal foundation of ant programming. In: Dorigo M, Di Caro G, Sampels M (eds) Ant algorithms. Springer, Berlin, pp 188–201. https://doi.org/10.1007/3-540-45724-0_16

    Chapter  Google Scholar 

  7. Busoniu L, Babuska R, De Schutter B (2008) A comprehensive survey of multiagent reinforcement learning. IEEE Trans Syst Man Cybern Part C (Appl Rev) 38(2):156–172. https://doi.org/10.1109/TSMCC.2007.913919

    Article  Google Scholar 

  8. Conforth M, Meng Y (2008) Reinforcement learning for neural networks using swarm intelligence. In: 2008 IEEE swarm intelligence symposium, pp 1–7. https://doi.org/10.1109/SIS.2008.4668289

  9. Coulom R (2008) Whole-history rating: a Bayesian rating system for players of time-varying strength. In: International conference on computers and games

  10. Dorigo M, Birattari M, Stutzle T (2006) Ant colony optimization. IEEE Comput Intell Mag 1(4):28–39. https://doi.org/10.1109/mci.2006.329691

    Article  Google Scholar 

  11. Dorigo M, Gambardella L (1997) Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Trans Evol Comput 1(1):53–66. https://doi.org/10.1109/4235.585892

    Article  Google Scholar 

  12. Dorigo M, Maniezzo V, Colorni A (1996) Ant system: optimization by a colony of cooperating agents. IEEE Trans Syst Man Cybern Part B Cybern 26(1):29–41. https://doi.org/10.1109/3477.484436

    Article  Google Scholar 

  13. Dorigo M, Stützle T (2003) The ant colony optimization metaheuristic: algorithms, applications, and advances. In: Handbook of metaheuristics. Springer, pp 250–285. https://doi.org/10.1007/0-306-48056-5_9

  14. Dorigo M, Stützle T (2018) Ant colony optimization: overview and recent advances. In: Handbook of metaheuristics. Springer International Publishing, pp 311–351. https://doi.org/10.1007/978-3-319-91086-4_10

  15. Galindo-Serrano A, Giupponi L, Blasco P, Dohler M (2010) Learning from experts in cognitive radio networks: the docitive paradigm. In: 2010 Proceedings of the fifth international conference on cognitive radio oriented wireless networks and communications, pp 1–6. https://doi.org/10.4108/ICST.CROWNCOM2010.9173

  16. Gao K, Zhang Y, Sadollah A, Su R (2016) Optimizing urban traffic light scheduling problem using harmony search with ensemble of local search. Appl Soft Comput 48:359–372. https://doi.org/10.1016/j.asoc.2016.07.029

    Article  Google Scholar 

  17. García-Nieto J, Alba E, Carolina Olivera A (2012) Swarm intelligence for traffic light scheduling: application to real Urban areas. Eng Appl Artif Intell 25(2):274–283. https://doi.org/10.1016/j.engappai.2011.04.011

    Article  Google Scholar 

  18. Iima H, Kuroe Y, Emoto K (2011) Swarm reinforcement learning methods for problems with continuous state-action space. In: 2011 IEEE international conference on systems, man, and cybernetics, pp 2173–2180. https://doi.org/10.1109/ICSMC.2011.6083999

  19. Iima H, Kuroe Y, Matsuda S (2010) Swarm reinforcement learning method based on ant colony optimization. In: 2010 IEEE international conference on systems, man and cybernetics, pp 1726–1733. https://doi.org/10.1109/ICSMC.2010.5642307

  20. Juang C, Lu C (2009) Ant colony optimization incorporated with fuzzy q-learning for reinforcement fuzzy control. IEEE Trans Syst Man Cybern Part A Syst Hum 39(3):597–608. https://doi.org/10.1109/TSMCA.2009.2014539

    Article  Google Scholar 

  21. Kang F, Li J, Dai J (2019) Prediction of long-term temperature effect in structural health monitoring of concrete dams using support vector machines with Jaya optimizer and salp swarm algorithms. Adv Eng Softw 131:60–76. https://doi.org/10.1016/j.advengsoft.2019.03.003

    Article  Google Scholar 

  22. Marquez JM, Ortega JA, Gonzalez-Abril L, Velasco F (2008) Creating adaptive learning paths using ant colony optimization and Bayesian networks. In: 2008 IEEE international joint conference on neural networks (IEEE World Congress on Computational Intelligence). IEEE. https://doi.org/10.1109/ijcnn.2008.4634349

  23. Matta M, Cardarilli GC, Di Nunzio L, Fazzolari R, Giardino D, Re M, Silvestri F, Spanò S (2019) Q-RTS: A real-time swarm intelligence based on multi-agent q-learning. Electron Lett. https://doi.org/10.1049/el.2019.0244

    Article  Google Scholar 

  24. Pham QV, Nguyen DC, Mirjalili S, Hoang DT, Nguyen DN, Pathirana PN, Hwang WJ (2020) Swarm intelligence for next-generation wireless networks: recent advances and applications. arXiv:2007.15221

  25. Qi X, Liu C, Fu C, Gan Z (2018) Theory of collective intelligence evolution and its applications in intelligent robots. Chin J Eng Sci 20(4):101. https://doi.org/10.15302/j-sscae-2018.04.017

    Article  Google Scholar 

  26. Shi Z, Tu J, Li Y, Wang Z (2013) Adaptive reinforcement q-learning algorithm for swarm-robot system using pheromone mechanism. In: 2013 IEEE international conference on robotics and biomimetics (ROBIO), pp 952–957. https://doi.org/10.1109/ROBIO.2013.6739586

  27. Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T, Lillicrap T, Simonyan K, Hassabis D (2018) A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419):1140–1144. https://doi.org/10.1126/science.aar6404

    Article  MathSciNet  MATH  Google Scholar 

  28. Socha K, Blum C (2007) An ant colony optimization algorithm for continuous optimization: application to feed-forward neural network training. Neural Comput Appl 16(3):235–247. https://doi.org/10.1007/s00521-007-0084-z

    Article  Google Scholar 

  29. Stützle T, Hoos HH (2000) MAX-MIN ant system. Future Gener Comput Syst 16(8):889–914. https://doi.org/10.1016/s0167-739x(00)00043-1

    Article  MATH  Google Scholar 

  30. Tsutsui S (2006) cAS: ant colony optimization with cunning ants. In: Parallel problem solving from nature—PPSN IX. Springer, Berlin, pp 162–171. https://doi.org/10.1007/11844297_17

  31. Wiesemann W, Stützle T (2006) Iterated ants: an experimental study for the quadratic assignment problem. In: Ant colony optimization and swarm intelligence. Springer, Berlin, pp 179–190. https://doi.org/10.1007/11839088_16

  32. Yuan Z, Fügenschuh A, Homfeld H, Balaprakash P, Stützle T, Schoch M (2008) Iterated greedy algorithms for a real-world cyclic train scheduling problem. In: Hybrid metaheuristics. Springer, Berlin, pp 102–116. https://doi.org/10.1007/978-3-540-88439-2_8

  33. Zedadra O, Guerrieri A, Jouandeau N, Spezzano G, Seridi H, Fortino G (2018) Swarm intelligence-based algorithms within IoT-based systems: a review. J Parallel Distrib Comput 122:173–187

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Ji Hua Laboratory for the support of this paper (project ID X190021TB190). This study was also supported by Shanghai Engineering Research Center of AI and Robotics, Fudan University, China and Engineering Research Center of AI and Robotics, Ministry of Education, China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhongxue Gan.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Neural network architecture

The neural network architecture is a simplified version compared to that of AlphaZero. The input to the neural network is an \(n\times n \times 4\) image stack comprising 4 binary feature planes, where n denotes the size of the board. The first two feature planes: \(X_t\) and \(Y_t\), consist of binary values indicating the presence of current player’s pieces and opponent’s pieces respectively, where \(X_t^i = 1\) if intersection i contains a stone of current player at time step t, and \(Y_t^i = 1\) if intersection i contains a stone of opponent player; \(X_t^i, Y_t^i = 0\) if the intersection i is empty. The third feature plane is the latest move of opponent, the value of position corresponding to opponent’s latest move is 1 and others are 0. The final feature plane, C, denotes the color of the player who is to play, and has a constant value of either 1 if black is to play or 0 if white is to play. These planes are concatenated together to provide input features \(s_t=[X_t, Y_t, M_{t-1} ,C] \). History feature of the opponent’s latest move \(M_{t-1}\) and the color feature C are unnecessary in the games like Four in a Row, because the current board has perfect information for determining optimal policy and value function. Nonetheless, it does make sense that current player tends to place next move around the latest move position of its opponent, which denotes that history feature \(M_{t-1}\) can be instructive for current player. Similarly, the color feature C contains information whether current player is on the offensive and tends to win.

The input features \(s_t\) are proceeded by a public convolutional block followed by two separated fully connected networks. The public convolutional block uses the following modules:

  1. (1)

    A convolution of 32 filters of kernel size \(3\times 3\) with stride 1 and padding 1, which is activated by a rectifier nonlinearity;

  2. (2)

    A convolution of 64 filters of kernel size \(3\times 3\) with stride 1 and padding 1, which is activated by a rectifier nonlinearity;

  3. (3)

    A convolution of 128 filters of kernel size \(3\times 3\) with stride 1 and padding 1, which is activated by a rectifier nonlinearity.

The output of the public convolutional block is sent to two separate heads for computing the policy and value. The policy head applies the following modules:

  1. (1)

    A convolution of four filters of kernel size \(1\times 1\) with stride 1, which is activated by a rectifier nonlinearity;

  2. (2)

    A fully connected linear layer that outputs a vector with size \(n^2\);

  3. (3)

    A soft-max nonlinear function outputting possibilities of all positions, where the illegal moves are manually removed during playing and the probabilities are re-normalized.

The value head applies the following modules:

  1. (1)

    A convolution of 2 filters of kernel size \(1\times 1\) with stride 1, which is activated by a rectifier nonlinearity;

  2. (2)

    A fully connected linear layer of size 64, which is activated by a rectifier nonlinearity;

  3. (3)

    A fully connected linear layer to a scalar;

  4. (4)

    A tanh nonlinearity outputting a scalar in the range \([-1,1]\).

The overall network has a depth of 5 or 6, consisting of 3 public convolutional layers, plus 2 layers for the policy head and 3 layers for the value head. With this simplified network, training and predicting are relatively fast.

1.2 Sensitivity of the ant number

Since the number of ants is a critical parameter for the conventional ACO, here we perform simple experiments to analyze the sensitivity of the proposed algorithm. The experiments are performed by using a toy navigation task—grid-based path planning. The conventional ACO and the policy neural network guided ACO have been tested on a map of \(10\times 10\) grids with randomly generated start and target states. The average lengths of the trajectories over 20 trials are presented in Fig. 8. By comparing the two algorithms, it is clearly shown that for the conventional ACO, a greater number of ants leads to a quicker convergence. For the proposed method, apparently, there are differences between the one-ant case and the others in Fig. 8b. However, by comparing the length of cases for 2, 4 and 8 ants, the whole iteration number is not sensitive to the number of ants. It could be concluded that more ants (\(>2\)) may not improve the performance significantly but only increase the total amount of computation, which shows we do not need as many ants as the conventional ACO. The reason is that the conventional ACO uses more ants to explore the feasible space to avoid being trapped by a local optimum, while in our method, the exploration can also be driven by the stochasticity of the neural network output, and thus fewer ants are needed through the iterations of searching in the planning space. In other words, the solution is searched by many ants (for conventional ACO) or by fewer ants recursively iterated again and again.

Fig. 8
figure 8

Convergence analysis of different number of ants by a toy navigation task—grid-based path planing using two different algorithms: a the conventional ACO, b the policy neural network guided ACO

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qi, X., Gan, Z., Liu, C. et al. Collective intelligence evolution using ant colony optimization and neural networks. Neural Comput & Applic 33, 12721–12735 (2021). https://doi.org/10.1007/s00521-021-05918-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-021-05918-7

Keywords

Navigation