Abstract
Image classification on mobile devices can provide convenient and secure services for users when using various social software. The traditional classification method mainly relies on the user’s manual marking, but the accuracy of automatic classification has some defects. With the development of convolutional neural network(CNN), the design of lightweight neural network has become a hot topic. However, the state-of-the-art studies always sacrifice classification accuracy for network lightweight, which greatly frustrates usability. In this paper, a new neural network framework, named MobVi, is proposed to enhance the precision of lightweight neural network by solution space division. MobVi is including image solution space division and judgment class. The former uses clustering method based on deep learning to distinguish which small solution space the image belongs to, while the latter uses lightweight neural network customized for the solution space to judge the class. In order to reduce the amount of model parameters and calculations, we designed a customized CNN module. Finally, we propose an energy prediction model to measure whether the model can be successfully implemented on mobile devices. A series of experiments have proved that MobVi has better performance than most existing models for mobile devices. Our model achieves 83.5% accuracy on CIFAR-10 data set, and the parameter quantity is only 2.0 M.
Similar content being viewed by others
References
Alaziz, M., Jia, Z., Howard, R.E., Lin, X., Zhang, Y.: In-bed body motion detection and classification system. ACM Trans. Sens. Netw. 16(2), 13:1-13:26 (2020)
Bay, H., Tuytelaars, T., Gool, L. Van.: Surf: Speeded up robust features. In: European Conference on Computer Vision(ECCV), pp. 404–417 (2006)
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems 19, pp. 153–160 (2007)
Bhandari, R., Nambi, A.U., Padmanabhan, V.N., Raman, B.: Driving lane detection on smartphones using deep neural networks. ACM Trans. Sens. Netw 16(1), 2:1-2:22 (2020)
Bhardwaj, S., Srinivasan, M., Khapra, M. M.: Efficient video classification using fewer frames. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 354–363 (2019)
Caron, M., Bojanowski, P., Joulin, A., Douze, M.: Deep clustering for unsupervised learning of visual features. In: European Conference on Computer Vision(ECCV), pp. 139–156 (2018)
Chang, J., Wang, L., Meng, G., Xiang, S., Pan, C.: Deep adaptive image clustering. In: IEEE International Conference on Computer Vision (ICCV), pp. 5880–5888 (2017)
Chen, Y., Tu, L.: Density-based clustering for real-time stream data. In: International Conference on Knowledge Discovery and Data Mining, p. 133142 (2007)
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Li, F.: Imagenet: A large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 248–255 (2009)
Franti, P., Virmajoki, O., Hautamaki, V.: Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Trans. Pattern Anal. Mach. Intell. 28(11), 1875–1881 (2006)
Gowda, K., Krishna, G.: Agglomerative clustering using the concept of mutual nearest neighbourhood. Pattern Recog. 10(2), 105–112 (1978)
Han, S., Mao, H., Dally, W. J.: Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv: 1510.00149 (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile visionapplications. CoRR (2017). arXiv:1704.04861
Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., Keutzer, K.: Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size. CoRR (2016). arXiv:1602.07360
Jaderberg, M., Vedaldi, A., Zisserman, A.: Speeding up convolutional neural networks with low rank expansions. In: British Machine Vision Conference(BMVC) (2014)
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Handb. Syst. Autoim. Dis. 1 (2009)
Krizhevsky, A., Sutskever, I.,Hinton, G.: Imagenet:classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems(NIPS) (2012)
Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H. P.: Pruning filters for efficient convnets. In: International Conference on Learning Representations(ICLR) (2016)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Norouzi, M., Fleet, D. J.: Cartesian k-means. In: IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 3017–3024 (2013)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: Mobilenetv2: inverted residuals and linear bottlenecks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4510–4520 (2018)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations(ICLR) (2014)
Sindhwani, V., Sainath, T., Kumar, S.: Structured transforms for small-footprint deep learning. In: International Conference on Neural Information Processing Systems(NIPS), pp. 3088–3096 (2015)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2015)
Wang, J., Wang, J., Song, J., Xu, X., Shen, H., Li, S.: Optimized cartesian k-means. IEEE Trans. Knowl. Data Eng. 27(1), 180–192 (2015)
Yang, Y., Xu, D., Nie, F., Yan, S., Zhuang, Y.: Image clustering using local discriminant models and global integration. IEEE Trans. Image Process. 19(10), 2761–2773 (2010)
Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6848–6856 (2018)
Acknowledgements
This work was supported in part by International Cooperation Project of Shaanxi Province (No. 2020KW-004), the China Postdoctoral Science Foundation (No. 2017M613187), the Key Research and Development Project of Shaanxi Province (No. 2018SF-369), and the Shaanxi Science and Technology Innovation Team Support Project under grant agreement (No. 2018TD-026).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, G., Dai, X., Liu, X. et al. An efficient and low power deep learning framework for image recognition on mobile devices. CCF Trans. Pervasive Comp. Interact. 4, 1–12 (2022). https://doi.org/10.1007/s42486-021-00076-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42486-021-00076-0