Abstract
Neural networks are getting wider and deeper to achieve state-of-the-art results in various machine learning domains. Such networks result in complex structures, high model size, and computational costs. Moreover, these networks are failing to adapt to new data due to their isolation in the specific domain-target space. To tackle these issues, we propose a sparse learning method to train the existing network on new classes by selecting non-crucial parameters from the network. Sparse learning also manages to keep the performance of existing classes with no additional network structure and memory costs by employing an effective node selection technique, which analyzes and selects unimportant parameters by using information theory in the neuron distribution of the fully connected layers. Our method could learn up to 40% novel classes without notable loss in the accuracy of existing classes. Through experiments, we show how a sparse learning method competes with state-of-the-art methods in terms of accuracy and even surpasses the performance of related algorithms in terms of efficiency in memory, processing speed, and overall training time. Importantly, our method can be implemented in both small and large applications, and we justify this by using well-known networks such as LeNet, AlexNet, and VGG-16.
Similar content being viewed by others
References
R. Girshick, J. Donahue, T. Darrell, and J. Malik, \Rich feature hierarchies for accurate object detection and semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580–587
R. Girshick, “Fast r-cnn," in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440–1448
J.-J. Hwang and T.-L. Liu, “Pixel-wise deep learning for contour detection," arXiv preprint arXiv:1504.01989, 2015
Johnson, J., Karpathy, A., & Fei-Fei, L. (2016). Densecap: fully convolutional localization networks for dense captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4565-4574)
Chen, W., Wilson, J., Tyree, S., Weinberger, K. Q., & Chen, Y. (2016). Compressing convolutional neural networks in the frequency domain. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1475-1484)
Farabet C, Couprie C, Najman L, LeCun Y (2012) Learning hierarchical features for scene labeling. IEEE Trans Pattern Anal Mach Intell 35(8):1915–1929
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431-3440)
Bansal, A., Russell, B., & Gupta, A. (2016). Marr revisited: 2d-3d alignment via surface normal prediction. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5965-5974)
Eigen, D., & Fergus, R. (2015). Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proceedings of the IEEE international conference on computer vision (pp. 2650-2658)
Wang, X., Fouhey, D., & Gupta, A. (2015). Designing deep networks for surface normal estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 539-547)
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778)
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: towards real-time object detection with region proposal networks. In Advances in neural information processing systems (pp. 91-99)
Sermanet, P., Eigen, D., Mathieu, M., Zhang, X., Fergus, R., & Lecun, Y. (2013). OverFeat detection using deep learning. In International Conference on Learning Representations (ICLR) (Vol. 16)
Boukli Hacene G, Gripon V, Farrugia N, Arzel M, Jezequel M (2018) Transfer incremental learning using data augmentation. Appl Sci 8(12):2512
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105)
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–9)
Han, K., Vedaldi, A., & Zisserman, A. (2019). Learning to discover novel visual categories via deep transfer clustering. In Proceedings of the IEEE International Conference on Computer Vision (pp. 8401-8409)
Harris B, Bae I, Egger B (2019) Architectures and algorithms for on-device user customization of CNNs. Integration 67:121–133
Lawrence, N. D., & Platt, J. C. (2004). Learning to learn with the informative vector machine. In Proceedings of the twenty-first international conference on Machine learning (p. 65)
Bonilla, E. V., Chai, K. M., & Williams, C. (2008). Multi-task Gaussian process prediction. In Advances in neural information processing systems (pp. 153-160)
Schwaighofer, A., Tresp, V., & Yu, K. (2005). Learning Gaussian process kernels via hierarchical Bayes. In Advances in neural information processing systems (pp. 1209-1216)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2818-2826)
Wang, Y. X., & Hebert, M. (2016). Learning from small sample sets by combining unsupervised meta-training with CNNs. In Advances in Neural Information Processing Systems (pp. 244-252)
Wu, Y., Chen, Y., Wang, L., Ye, Y., Liu, Z., Guo, Y., & Fu, Y. (2019). Large scale incremental learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 374-382)
Ando RK, Zhang T (2005) A framework for learning predictive structures from multiple tasks and unlabeled data. J Mach Learn Res 6(Nov):1817–1853
Castro, F. M., Marín-Jiménez, M. J., Guil, N., Schmid, C., & Alahari, K. (2018). End-to-end incremental learning. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 233-248)
Argyriou A, Evgeniou T, Pontil M (2008) Convex multi-task feature learning. Mach Learn 73(3):243–272
Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531
Wu, J., Leng, C., Wang, Y., Hu, Q., & Cheng, J. (2016). Quantized convolutional neural networks for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4820-4828)
Yu D, Deng L (2010) Deep learning and its applications to signal and information processing [exploratory dsp]. IEEE Signal Process Mag 28(1):145–154
Denil, M., Shakibi, B., Dinh, L., Ranzato, M. A., & De Freitas, N. (2013). Predicting parameters in deep learning. In Advances in neural information processing systems (pp. 2148-2156)
Cheng J, Wu J, Leng C, Wang Y, Hu Q (2017) Quantized CNN: a unified approach to accelerate and compress convolutional networks. IEEE Trans Neural Networks Learning Syst 29(10):4730–4743
Hu, H., Peng, R., Tai, Y. W., & Tang, C. K. (2016). Network trimming: a data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250
Hur C, Kang S (2019) Entropy-based pruning method for convolutional neural networks. J Supercomput 75(6):2950–2963
Han, S., Pool, J., Tran, J., & Dally, W. (2015). Learning both weights and connections for efficient neural network. In Advances in neural information processing systems (pp. 1135-1143)
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In 2004 conference on computer vision and pattern recognition workshop (pp. 178-178). IEEE
Doersch, C., & Zisserman, A. (2019). Sim2real transfer learning for 3D human pose estimation: motion to the rescue. In Advances in Neural Information Processing Systems (pp. 12929-12941)
Dawalatabad, N., Madikeri, S., Sekhar, C. C., & Murthy, H. A. (2019). Incremental transfer learning in two-pass information bottleneck based speaker Diarization system for meetings. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6291-6295). IEEE
Deng J, Frühholz S, Zhang Z, Schuller B (2017) Recognizing emotions from whispered speech based on acoustic feature transfer learning. IEEE Access 5:5235–5246
Acknowledgments
This work was supported by Inha University Research Grant. This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education(NRF-2019R1A2C1008048).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ibrokhimov, B., Hur, C. & Kang, S. Effective node selection technique towards sparse learning. Appl Intell 50, 3239–3251 (2020). https://doi.org/10.1007/s10489-020-01720-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-01720-5