Skip to main content
Log in

ModPSO-CNN: an evolutionary convolution neural network with application to visual recognition

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Training optimization plays a vital role in the development of convolution neural network (CNN). CNNs are hard to train because of the presence of multiple local minima. The optimization problem for a CNN is non-convex, hence, has multiple local minima. If any of the chosen hyper-parameters are not appropriate, it will end up at bad local minima, which leads to poor performance. Hence, proper optimization of the training algorithm for CNN is the key to converge to a good local minimum. Therefore, in this paper, we introduce an evolutionary convolution neural network (ModPSO-CNN) algorithm. The proposed algorithm results in the fusion of modified particle swarm optimization (ModPSO) along with backpropagation (BP) and convolution neural network (CNN). The training of CNN involves ModPSO along with backpropagation (BP) algorithm to encourage performance improvement by avoiding premature convergence and local minima. The ModPSO have adaptive, dynamic and improved parameters, to handle the issues in training CNN. The adaptive and dynamic parameters bring a proper balance between the global and local search ability, while an improved parameter keeps the diversity of the swarm. The proposed ModPSO algorithm is validated on three standard mathematical test functions and compared with three variants of the benchmark PSO algorithm. Furthermore, the performance of the proposed ModPSO-CNN is also compared with other training algorithms focusing on the analysis of computational cost, convergence and accuracy based on a standard problem specific to classification applications, such as CIFAR-10 dataset and face and skin detection dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  • Bo L, Ren X, Fox D (2010) Kernel descriptors for visual recognition. In: Advances in neural information processing systems, pp 244–252

  • Boureau Y-L, Ponce J, LeCun Y (2010) A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 111–118

  • Bulan O, Kozitsky V, Ramesh P, Shreve M (2017) Segmentation-and annotation-free license plate recognition with deep localization and failure identification. IEEE Trans Intell Transp Syst 18(9):2351–2363

    Article  Google Scholar 

  • Chan T-H, Jia K, Gao S, Lu J, Zeng Z, Ma Y (2015) Pcanet: a simple deep learning baseline for image classification? IEEE Trans Image Process 24(12):5017–5032

    Article  MathSciNet  Google Scholar 

  • Coates A, Ng A, Lee H (2011) An analysis of single-layer networks in unsupervised feature learning. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 215–223

  • Damer N, Opel A, Nouak A (2014) Cmc curve properties and biometric source weighting in multi-biometric score-level fusion. In: 17th international conference on information fusion (FUSION). IEEE, pp 1–6

  • DeCann B, Ross A (2013) Relating roc and cmc curves via the biometric menagerie. In: 2013 IEEE Sixth international conference on biometrics: theory, applications and systems (BTAS). IEEE, pp 1–8

  • Ding C, Tao D (2015) Robust face recognition via multimodal deep face representation. IEEE Trans Multimedia 17(11):2049–2058

    Article  Google Scholar 

  • Higashi N, Iba H (2003) Particle swarm optimization with gaussian mutation. In: Proceedings of the 2003 IEEE swarm intelligence symposium, SIS’03, IEEE, pp 72–79

  • Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580

  • Huang FJ, Boureau Y-L, LeCun Y et al (2007) Unsupervised learning of invariant feature hierarchies with applications to object recognition. In: IEEE conference on computer vision and pattern recognition, CVPR’07, IEEE, pp 1–8

  • Jarrett K, Kavukcuoglu K, LeCun Y et al (2009) What is the best multi-stage architecture for object recognition?. In: 2009 IEEE 12th international conference on computer vision, IEEE, pp 2146–2153

  • Kennedy J (2011) Particle swarm optimization. In: Encyclopedia of machine learning, Springer, pp 760–766

  • Khamsemanan N, Nattee C, Jianwattanapaisarn N (2017) Human identification from freestyle walks using posture-based gait feature. IEEE Trans Inform Forensics Secur 13(1):119–128

    Article  Google Scholar 

  • Krizhevsky A (2014) Cuda-convnet. code. google. com/p/cudaconvnet

  • Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical Report, Citeseer

  • Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  • LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  • LeCun Y, Kavukcuoglu K, Farabet C et al (2010) Convolutional networks and applications in vision. ISCAS 2010:253–256

    Google Scholar 

  • Liu H, Tian H-Q, Chen C, Li Y-F (2013) An experimental investigation of two wavelet-mlp hybrid frameworks for wind speed prediction using ga and pso optimization. Int J Electric Power Energy Syst 52:161–173

    Article  Google Scholar 

  • Low C-Y, Teoh AB-J, Toh K-A (2017) Stacking pcanet+: an overly simplified convnets baseline for face recognition. IEEE Signal Process Lett 24:1581–1585

    Article  Google Scholar 

  • Ludermir TB, De Oliveira WR (2013) Particle swarm optimization of mlp for the identification of factors related to common mental disorders. Expert Syst Appl 40(11):4648–4652

    Article  Google Scholar 

  • Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 807–814

  • Ngiam J, Chen Z, Chia D, Koh PW, Le QV, Ng AY (2010) Tiled convolutional neural networks. In: Advances in neural information processing systems, pp 1279–1287

  • Phung SL, Bouzerdoum A (2007) A pyramidal neural network for visual pattern recognition. IEEE Trans Neural Networks 18(2):329–343

    Article  Google Scholar 

  • Phung SL, Bouzerdoum A, Chai D (2005) Skin segmentation using color pixel classification: analysis and comparison. IEEE Trans Pattern Anal Mach Intell 27(1):148–154

    Article  Google Scholar 

  • Rehman SU, Tu S, Huang Y, Yang Z (2016) Face recognition: a novel un-supervised convolutional neural network method. In: IEEE international conference of online analysis and computing science (ICOACS), IEEE, pp 139–144

  • Rehman S u, Tu S, Huang Y, Liu G et al (2017) Csfl: A novel unsupervised convolution neural network approach for visual pattern classification. AI Commun 30(5):311–324

    Article  MathSciNet  Google Scholar 

  • Rehman S u, Tu S, Huang Y, Magurawalage C M S, Chang C-C et al (2018) Optimization of cnn through novel training strategy for visual classification problems. Entropy 20(4):290

    Article  Google Scholar 

  • Rehman O U, Tu S, Rehman S U, Khan S, Yang S (2018) Design optimization of electromagnetic devices using an improved quantum inspired particle swarm optimizer. Appl Comput Electromagnet Soc J 33:9

    Google Scholar 

  • Rehman O U, Rehman S U, Tu S, Khan S, Waqas M, Yang S (2018) A quantum particle swarm optimization method with fitness selection methodology for electromagnetic inverse problems. IEEE Access 6:63 155–63 163

    Article  Google Scholar 

  • Rehman S U, Tu S, Huang Y, Rehman O U (2018) A benchmark dataset and learning high-level semantic embeddings of multimedia for cross-media retrieval. IEEE Access 6:67 176–67 188

    Article  Google Scholar 

  • Seha SNA, Hatzinakos D (2018) Human recognition using transient auditory evoked potentials: a preliminary study. IET Biometrics 7(3):242–250

    Article  Google Scholar 

  • Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  • Tang Y (2013) Deep learning using linear support vector machines. arXiv:1306.0239

  • ur Rehman S, Huang Y, Tu S, ur Rehman O (2018) Facebook5k: a novel evaluation resource dataset for cross-media search. In: International conference on cloud computing and security, Springer, pp 512–524

  • Wang T, Wu DJ, Coates A, Ng AY (2012) End-to-end text recognition with convolutional neural networks. In: 21st International conference on pattern recognition (ICPR), IEEE, pp 3304–3308

  • Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: IEEE conference on computer vision and pattern recognition, CVPR 2009, IEEE, pp 1794–1801

  • Yang Z, Zhang Y-J, ur Rehman S, Huang Y (2017) Image captioning with object detection and localization. In: International conference on image and graphics, Springer, pp 109–118

  • Yu K, Zhang T (2010) Improved local coordinate coding using local tangents. In: ICML. Citeseer, pp 1215–1222

  • Zeiler M D, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, pp 818–833

  • Zhan Z-H, Zhang J, Li Y, Chung HS-H (2009) Adaptive particle swarm optimization. IEEE Trans Syst Man Cybern Part B 39(6):1362–1381

    Article  Google Scholar 

  • Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported in part by the National Natural Science Foundation of China (No. 61801008), National Key R&D Program of China (No. 2018YFB0803600), Beijing Natural Science Foundation National (No. L172049), and Beijing Science and Technology Planning Project (NO. Z171100004717001).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sadaqat ur Rehman.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interests.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tu, S., Rehman, S.u., Waqas, M. et al. ModPSO-CNN: an evolutionary convolution neural network with application to visual recognition. Soft Comput 25, 2165–2176 (2021). https://doi.org/10.1007/s00500-020-05288-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-020-05288-7

Keywords

Navigation