Skip to main content
Log in

Effective node selection technique towards sparse learning

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Neural networks are getting wider and deeper to achieve state-of-the-art results in various machine learning domains. Such networks result in complex structures, high model size, and computational costs. Moreover, these networks are failing to adapt to new data due to their isolation in the specific domain-target space. To tackle these issues, we propose a sparse learning method to train the existing network on new classes by selecting non-crucial parameters from the network. Sparse learning also manages to keep the performance of existing classes with no additional network structure and memory costs by employing an effective node selection technique, which analyzes and selects unimportant parameters by using information theory in the neuron distribution of the fully connected layers. Our method could learn up to 40% novel classes without notable loss in the accuracy of existing classes. Through experiments, we show how a sparse learning method competes with state-of-the-art methods in terms of accuracy and even surpasses the performance of related algorithms in terms of efficiency in memory, processing speed, and overall training time. Importantly, our method can be implemented in both small and large applications, and we justify this by using well-known networks such as LeNet, AlexNet, and VGG-16.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. R. Girshick, J. Donahue, T. Darrell, and J. Malik, \Rich feature hierarchies for accurate object detection and semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580–587

  2. R. Girshick, “Fast r-cnn," in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440–1448

  3. J.-J. Hwang and T.-L. Liu, “Pixel-wise deep learning for contour detection," arXiv preprint arXiv:1504.01989, 2015

  4. Johnson, J., Karpathy, A., & Fei-Fei, L. (2016). Densecap: fully convolutional localization networks for dense captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4565-4574)

  5. Chen, W., Wilson, J., Tyree, S., Weinberger, K. Q., & Chen, Y. (2016). Compressing convolutional neural networks in the frequency domain. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1475-1484)

  6. Farabet C, Couprie C, Najman L, LeCun Y (2012) Learning hierarchical features for scene labeling. IEEE Trans Pattern Anal Mach Intell 35(8):1915–1929

    Article  Google Scholar 

  7. Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431-3440)

  8. Bansal, A., Russell, B., & Gupta, A. (2016). Marr revisited: 2d-3d alignment via surface normal prediction. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5965-5974)

  9. Eigen, D., & Fergus, R. (2015). Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proceedings of the IEEE international conference on computer vision (pp. 2650-2658)

  10. Wang, X., Fouhey, D., & Gupta, A. (2015). Designing deep networks for surface normal estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 539-547)

  11. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778)

  12. Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: towards real-time object detection with region proposal networks. In Advances in neural information processing systems (pp. 91-99)

  13. Sermanet, P., Eigen, D., Mathieu, M., Zhang, X., Fergus, R., & Lecun, Y. (2013). OverFeat detection using deep learning. In International Conference on Learning Representations (ICLR) (Vol. 16)

  14. Boukli Hacene G, Gripon V, Farrugia N, Arzel M, Jezequel M (2018) Transfer incremental learning using data augmentation. Appl Sci 8(12):2512

    Article  Google Scholar 

  15. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105)

  16. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  17. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–9)

  18. Han, K., Vedaldi, A., & Zisserman, A. (2019). Learning to discover novel visual categories via deep transfer clustering. In Proceedings of the IEEE International Conference on Computer Vision (pp. 8401-8409)

  19. Harris B, Bae I, Egger B (2019) Architectures and algorithms for on-device user customization of CNNs. Integration 67:121–133

    Article  Google Scholar 

  20. Lawrence, N. D., & Platt, J. C. (2004). Learning to learn with the informative vector machine. In Proceedings of the twenty-first international conference on Machine learning (p. 65)

  21. Bonilla, E. V., Chai, K. M., & Williams, C. (2008). Multi-task Gaussian process prediction. In Advances in neural information processing systems (pp. 153-160)

  22. Schwaighofer, A., Tresp, V., & Yu, K. (2005). Learning Gaussian process kernels via hierarchical Bayes. In Advances in neural information processing systems (pp. 1209-1216)

  23. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2818-2826)

  24. Wang, Y. X., & Hebert, M. (2016). Learning from small sample sets by combining unsupervised meta-training with CNNs. In Advances in Neural Information Processing Systems (pp. 244-252)

  25. Wu, Y., Chen, Y., Wang, L., Ye, Y., Liu, Z., Guo, Y., & Fu, Y. (2019). Large scale incremental learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 374-382)

  26. Ando RK, Zhang T (2005) A framework for learning predictive structures from multiple tasks and unlabeled data. J Mach Learn Res 6(Nov):1817–1853

    MathSciNet  MATH  Google Scholar 

  27. Castro, F. M., Marín-Jiménez, M. J., Guil, N., Schmid, C., & Alahari, K. (2018). End-to-end incremental learning. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 233-248)

  28. Argyriou A, Evgeniou T, Pontil M (2008) Convex multi-task feature learning. Mach Learn 73(3):243–272

    Article  Google Scholar 

  29. Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531

  30. Wu, J., Leng, C., Wang, Y., Hu, Q., & Cheng, J. (2016). Quantized convolutional neural networks for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4820-4828)

  31. Yu D, Deng L (2010) Deep learning and its applications to signal and information processing [exploratory dsp]. IEEE Signal Process Mag 28(1):145–154

    Article  Google Scholar 

  32. Denil, M., Shakibi, B., Dinh, L., Ranzato, M. A., & De Freitas, N. (2013). Predicting parameters in deep learning. In Advances in neural information processing systems (pp. 2148-2156)

  33. Cheng J, Wu J, Leng C, Wang Y, Hu Q (2017) Quantized CNN: a unified approach to accelerate and compress convolutional networks. IEEE Trans Neural Networks Learning Syst 29(10):4730–4743

    Article  Google Scholar 

  34. Hu, H., Peng, R., Tai, Y. W., & Tang, C. K. (2016). Network trimming: a data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250

  35. Hur C, Kang S (2019) Entropy-based pruning method for convolutional neural networks. J Supercomput 75(6):2950–2963

    Article  Google Scholar 

  36. Han, S., Pool, J., Tran, J., & Dally, W. (2015). Learning both weights and connections for efficient neural network. In Advances in neural information processing systems (pp. 1135-1143)

  37. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  38. Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In 2004 conference on computer vision and pattern recognition workshop (pp. 178-178). IEEE

  39. Doersch, C., & Zisserman, A. (2019). Sim2real transfer learning for 3D human pose estimation: motion to the rescue. In Advances in Neural Information Processing Systems (pp. 12929-12941)

  40. Dawalatabad, N., Madikeri, S., Sekhar, C. C., & Murthy, H. A. (2019). Incremental transfer learning in two-pass information bottleneck based speaker Diarization system for meetings. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6291-6295). IEEE

  41. Deng J, Frühholz S, Zhang Z, Schuller B (2017) Recognizing emotions from whispered speech based on acoustic feature transfer learning. IEEE Access 5:5235–5246

    Google Scholar 

Download references

Acknowledgments

This work was supported by Inha University Research Grant. This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education(NRF-2019R1A2C1008048).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sanggil Kang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ibrokhimov, B., Hur, C. & Kang, S. Effective node selection technique towards sparse learning. Appl Intell 50, 3239–3251 (2020). https://doi.org/10.1007/s10489-020-01720-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-01720-5

Keywords

Navigation