Skip to main content
Log in

On-Device Partial Learning Technique of Convolutional Neural Network for New Classes

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

In general, Convolutional Neural Networks (CNNs) have a complex network structure consisted of heavy layers with huge number of parameters such as the convolutional, pooling, relu-activation, and fully-connected layers. Due to the complexity and computation load, CNNs are trained on a cloud environment. There are a couple of drawbacks on learning and performing on the cloud such as security problem of personal information and dependency of communication state. Recently, CNNs are directly trained at the mobile devices in order to alleviate those two drawbacks. Due to the resource limitation of the mobile devices, the structure of CNNs needs to be compressed or to reduce training overhead. In this paper, we propose an on-device partial learning technique with the following benefits: (1) does not require additional neural network structures, and (2) reduces unnecessary computation overhead. We select a subset of influential weights from a trained network to accommodate the new classification class. The selection is made based on the information of the contribution of each weight to output, which is measured using the entropy concept. In the experimental section, we demonstrate and analyze our method with a CNN image classifier using two datasets such as Mixed National Institute of Standards and Technology image data and Microsoft Common Objection in Context data. As a result, the computational resources at LeNet-5 and AlexNet showed performance improvements of 1.7× and 2.3×, respectively, and memory resources demonstrated performance improvements of 1.4× and 1.6×, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.

    Article  Google Scholar 

  2. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ..., Rabinovich, A. (2015, June). Going deeper with convolutions. Cvpr.

  3. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).

  4. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.

  5. Denil, M., Shakibi, B., Dinh, L., & De Freitas, N. (2013). Predicting parameters in deep learning. In Advances in neural information processing systems (pp. 2148-2156).

  6. Ye, J. (2005). Generalized low rank approximations of matrices. Machine Learning, 61(1–3), 167–191.

    Article  MATH  Google Scholar 

  7. Denil, M., Shakibi, B., Dinh, L., & De Freitas, N. (2013). Predicting parameters in deep learning. In Advances in neural information processing systems (pp. 2148-2156).

  8. Yu, D., & Deng, L. (2011). Deep learning and its applications to signal and information processing [exploratory dsp]. IEEE Signal Processing Magazine, 28(1), 145–154.

    Article  Google Scholar 

  9. Cheng, J., Wu, J., Leng, C., Wang, Y., & Hu, Q. (2017). Quantized CNN: A unified approach to accelerate and compress convolutional networks. IEEE Transactions on Neural Networks and Learning Systems.

  10. Schneider, P., Biehl, M., & Hammer, B. (2009). Adaptive relevance matrices in learning vector quantization. Neural Computation, 21(12), 3532–3561.

    Article  MathSciNet  MATH  Google Scholar 

  11. Polyak, A., & Wolf, L. (2015). Channel-level acceleration of deep face representations. IEEE Access, 3, 2163–2175.

    Article  Google Scholar 

  12. Han, S., Pool, J., Tran, J., & Dally, W. (2015). Learning both weights and connections for efficient neural network. In Advances in neural information processing systems (pp. 1135-1143).

  13. Machida, H., Yoneda, H., & Kanno, H. (1992). U.S. Patent No. 5,109,436. Washington, DC: U.S. Patent and Trademark Office.

  14. Baier, A., & Baier, P. W. (1983). Digital matched filtering of arbitrary spread-spectrum waveforms using correlators with binary quantization. In Military Communications Conference, 1983. MILCOM 1983. IEEE (Vol. 2, pp. 418-423). IEEE.

  15. Yuan, Z. X., Xu, B. L., & Yu, C. Z. (1999). Binary quantization of feature vectors for robust text-independent speaker identification. IEEE Transactions on Speech and Audio Processing, 7(1), 70–78.

    Article  Google Scholar 

  16. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).

  17. Sivaswamy, J., Krishnadas, S. R., Joshi, G. D., Jain, M., & Tabish, A. U. S. (2014, April). Drishti-gs: Retinal image dataset for optic nerve head (onh) segmentation. In Biomedical Imaging (ISBI), 2014 IEEE 11th International Symposium on (pp. 53-56). IEEE. http://cvit.iiit.ac.in/projects/mip/drishti-gs/mip-dataset2/Download.php

  18. Harris, B., Moghaddam, M. S., Kang, D., Bae, I., Kim, E., Min, H., ... & Choi, K. (2018, January). Architectures and algorithms for user customization of CNNs. In Proceedings of the 23rd Asia and South Pacific Design Automation Conference (pp. 540–547). IEEE Press.

  19. Killian, T. W., Daulton, S., Konidaris, G., & Doshi-Velez, F. (2017). Robust and efficient transfer learning with hidden parameter markov decision processes. In advances in neural information processing systems (pp. 6250-6261).

  20. Shin, H. C., Roth, H. R., Gao, M., Lu, L., Xu, Z., Nogues, I., Yao, J., Mollura, D., & Summers, R. M. (2016). Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Transactions on Medical Imaging, 35(5), 1285–1298.

    Article  Google Scholar 

  21. Fernandes, K., Cardoso, J. S., & Fernandes, J. (2017). Transfer learning with partial observability applied to cervical cancer screening. In Iberian conference on pattern recognition and image analysis (pp. 243-250). Springer, Cham.

  22. Xu, S., Mu, X., Chai, D., & Wang, S. (2017). Adapting remote sensing to new domain with ELM parameter transfer. IEEE Geoscience and Remote Sensing Letters, 14(9), 1618–1622.

    Article  Google Scholar 

  23. Afridi, M. J., Ross, A., & Shapiro, E. M. (2018). On automated source selection for transfer learning in convolutional neural networks. Pattern Recognition, 73, 65–75.

    Article  Google Scholar 

  24. Peng, X., Sun, B., Ali, K., & Saenko, K. (2015). Learning deep object detectors from 3d models. In proceedings of the IEEE international conference on computer vision (pp. 1278-1286).

  25. Long, M., Zhu, H., Wang, J., & Jordan, M. I. (2016). Unsupervised domain adaptation with residual transfer networks. In advances in neural information processing systems (pp. 136-144).

  26. Su, H., Qi, C. R., Li, Y., & Guibas, L. J. (2015). Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views. In proceedings of the IEEE international conference on computer vision (pp. 2686-2694).

  27. Tjandra, A., Sakti, S., & Nakamura, S. (2017). Attention-based wav2text with feature transfer learning. In 2017 IEEE automatic speech recognition and understanding workshop (ASRU) (pp. 309-315). IEEE.

  28. Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., & Xiao, J. (2015). 3d shapenets: A deep representation for volumetric shapes. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1912-1920).

  29. Su, H., Maji, S., Kalogerakis, E., & Learned-Miller, E. (2015). Multi-view convolutional neural networks for 3d shape recognition. In proceedings of the IEEE international conference on computer vision (pp. 945-953).

  30. Qi, C. R., Su, H., Mo, K., & Guibas, L. J. (2017). Pointnet: Deep learning on point sets for 3d classification and segmentation. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 652-660).

  31. Qi, C. R., Su, H., Nießner, M., Dai, A., Yan, M., & Guibas, L. J. (2016). Volumetric and multi-view cnns for object classification on 3d data. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5648-5656).

  32. Kalogerakis, E., Averkiou, M., Maji, S., & Chaudhuri, S. (2017). 3D shape segmentation with projective convolutional networks. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3779-3788).

  33. Song, S., Lichtenberg, S. P., & Xiao, J. (2015). Sun rgb-d: A rgb-d scene understanding benchmark suite. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 567-576).

  34. Lindblad, G. (1973). Entropy, information and quantum measurements. Communications in Mathematical Physics, 33(4), 305–322.

    Article  MathSciNet  Google Scholar 

  35. Föllmer, H. (1973). On entropy and information gain in random fields. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 26(3), 207–217.

    Article  MathSciNet  MATH  Google Scholar 

  36. Borland, L., Plastino, A. R., & Tsallis, C. (1998). Information gain within nonextensive thermostatistics. Journal of Mathematical Physics, 39(12), 6490–6501.

    Article  MathSciNet  MATH  Google Scholar 

  37. Nalewajski, R. F. (2005). Partial communication channels of molecular fragments and their entropy/information indices. Molecular Physics, 103(4), 451–470.

    Article  Google Scholar 

  38. Huerta, M. A., & Robertson, H. S. (1969). Entropy, information theory, and the approach to equilibrium of coupled harmonic oscillator systems. Journal of Statistical Physics, 1(3), 393–414.

    Article  Google Scholar 

  39. Ebeling, W. (1993). Entropy and information in processes of self-organization: Uncertainty and predictability. Physica A: Statistical Mechanics and its Applications, 194(1–4), 563–575.

    Article  Google Scholar 

  40. LeCun, Y., Cortes, C., & Burges, C. J. (2010). MNIST handwritten digit database. AT&T Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2.

  41. Mallard, W. G., Westley, F., Herron, J. T., Hampson, R. F., & Frizzell, D. H. (1998). NIST chemical kinetics database, version 2Q98. Gaithersburg: National Institute of Standards and Technology. Web address: http://kinetics.nist.gov.

  42. Lei, H., Han, T., Zhou, F., Yu, Z., Qin, J., Elazab, A., & Lei, B. (2018). A deeply supervised residual network for HEp-2 cell classification via cross-modal transfer learning. Pattern Recognition, 79, 290–302.

    Article  Google Scholar 

  43. Fadaeddini, A., Eshghi, M., & Majidi, B. (2018). A deep residual neural network for low altitude remote sensing image classification. In 2018 6th Iranian joint congress on fuzzy and intelligent systems (CFIS) (pp. 43-46). IEEE.

  44. McCallum, A. 20 newsgroups. (2008). http://people.cs.umass.edu/~mccallum/data-/20_newsgroups.tar.gz

  45. McCallum, A. SRAA. (2008) http://people.cs.umass.edu/~mccallum/data/sraa.tar.gz

  46. Lewis, David, et al. Reuters-21578. Test Collections, (1987) http://www.daviddlewis.com/resour-ces/testcollections/reuters21578/

  47. Voutilainen, A. (2003). Part-of-speech tagging. The Oxford handbook of computational linguistics, 219–232.

  48. Mohit, B. (2014). Named entity recognition, In Natural language processing of semitic languages (pp. 221–245). Berlin, Heidelberg: Springer.

    Book  Google Scholar 

  49. Blanco, E., Castell, N., & Moldovan, D. I. (2008, May). Causal Relation Extraction. In Lrec.

  50. Björkelund, A., Hafdell, L., & Nugues, P. (2009). Multilingual semantic role labeling. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning: Shared Task (pp. 43-48). Association for Computational Linguistics.

  51. Gupta, S., & Malik, J. (2015). Visual semantic role labeling. arXiv preprint arXiv:1505.04474.

  52. Bereziński, P., Jasiul, B., & Szpyrka, M. (2015). An entropy-based network anomaly detection method. Entropy, 17(4), 2367–2408.

    Article  Google Scholar 

  53. Han, B., Zhang, Z., Xu, C., Wang, B., Hu, G., Bai, L., ... & Hancock, E. R. (2017). Deep Face Model Compression Using Entropy-Based Filter Selection. In International Conference on Image Analysis and Processing (pp. 127–136). Springer, Cham.

  54. Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009, June). Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on (pp. 248-255). IEEE. http://image-net.org/download-images

  55. Park, E., Ahn, J., & Yoo, S. (2017). Weighted-entropy-based quantization for deep neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  56. Zilly, J., Buhmann, J. M., & Mahapatra, D. (2017). Glaucoma detection using entropy sampling and ensemble learning for automatic optic cup and disc segmentation. Computerized Medical Imaging and Graphics, 55, 28–41.

    Article  Google Scholar 

  57. Mjalli, F. S., Al-Asheh, S., & Alfadala, H. E. (2007). Use of artificial neural network black-box modeling for the prediction of wastewater treatment plants performance. Journal of Environmental Management, 83(3), 329–338.

    Article  Google Scholar 

  58. Hoeffding, W., & Robbins, H. (1948). The central limit theorem for dependent random variables. Duke Mathematical Journal, 15(3), 773–780.

    Article  MathSciNet  MATH  Google Scholar 

  59. LeCun, Y. (2015). LeNet-5, convolutional neural networks. URL: http://yann.lecun.com/exdb/lenet, 20.

  60. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., ... & Kudlur, M. (2016). TensorFlow: A System for Large-Scale Machine Learning. In OSDI (Vol. 16, pp. 265–283).

Download references

Acknowledgements

This work was supported by Inha University Grant.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sanggil Kang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hur, C., Kang, S. On-Device Partial Learning Technique of Convolutional Neural Network for New Classes. J Sign Process Syst 95, 909–920 (2023). https://doi.org/10.1007/s11265-020-01520-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-020-01520-7

Keywords

Navigation