Abstract
With the development of Internet of Things (IoT), data are increasingly appearing at the edge of a network. Processing tasks at the network edge can effectively solve the problems of personal privacy leakage and server overloading. As a result, it has attracted a great deal of attention. A number of efficient convolutional neural network (CNN) models are proposed to do so. However, since they require much computing and memory resources, none of them can be deployed to such typical edge computing devices as Raspberry Pi 3B+ and 4B+ to meet the real-time requirements of user tasks. Considering that a traditional machine learning method can precisely locate an object with a highly acceptable calculation load, this work reviews state-of-the-art literature and then proposes a CNN with reduced input size for an object detection system that can be deployed in edge computing devices. It splits an object detection task into object positioning and classification. In particular, this work proposes a CNN model with 44 \(\times\) 44-pixel inputs instead of much more inputs, e.g., 224 \(\times\) 224-pixel in many existing methods, for edge computing devices with slow memory access and limited computing resources. Its overall performance has been verified via a facial expression detection system realized in Raspberry Pi 3B+ and 4B+. The work makes accurate object detection at the edge possible.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
OpenVINO\(^{\mathrm{TM}}\) toolkit: https://docs.openvinotoolkit.org/latest/index.html.
References
Ahmed SB, Ali SF, Ahmad J, Adnan M, Fraz MM (2020) On the frontiers of pose invariant face recognition: a review. Artif Intell Rev 53(4):2571–2634
Arndt S, Turvey C, Andreasen NC (1999) Correlating and predicting psychiatric symptom ratings: Spearmans r versus kendalls tau correlation. J Psychiatric Res 33(2):97–104
Bao J, Wei S, Lv J, Zhang W (2020) Optimized faster-RCNN in real-time facial expression classification. In: IOP Conference Series: Materials Science and Engineering, vol 790, pp 012148
Chang T, Wen G, Hu Y, Ma J (2018) Facial expression recognition based on complexity perception classification algorithm. arXiv preprint arXiv:180300185
Chen S, Li Q, Zhou M, Abusorrah A (2021) Recent advances in collaborative scheduling of computing tasks in an edge computing paradigm. Sensors. https://doi.org/10.3390/s21030779
Devries T, Taylor GW (2017) Improved regularization of convolutional neural networks with cutout. CoRR arxiv: abs/1708.04552
Eigen D, Rolfe JT, Fergus R, LeCun Y (2014) Understanding deep architectures using a recursive convolutional network. In: ICLR
Gholami A, Kwon K, Wu B, Tai Z, Yue X, Jin PH, Zhao S, Keutzer K (2018) Squeezenext: Hardware-aware neural network design. In: CVPR, pp 1638–1647
Gilad-Bachrach R, Dowlin N, Laine K, Lauter KE, Naehrig M, Wernsing J (2016) Cryptonets: applying neural networks to encrypted data with high throughput and accuracy. In: ICML vol 48, pp 201–210
Girma A, Bahadori N, Sarkar M, Tadewos TG, Behnia MR, Mahmoud MN, Karimoddini A, Homaifar A (2020) IoT-enabled autonomous system collaboration for disaster-area management. IEEE CAA J Autom Sin 7(1):1
Goodfellow IJ, Erhan D, Carrier PL, Courville AC, Mirza M, Hamner B, Cukierski W, Tang Y, Thaler D, Lee D, Zhou Y, Ramaiah C, Feng F, Li R, Wang X, Athanasakis D, Shawe-Taylor J, Milakov M, Park J, Ionescu RT, Popescu M, Grozea C, Bergstra J, Xie J, Romaszko L, Xu B, Zhang C, Bengio Y (2013) Challenges in representation learning: a report on three machine learning contests. In: ICONIP vol 8228, pp 117–124
Han H, Zhou M, Zhang Y (2020) Can virtual samples solve small sample size problem of KISSME in pedestrian re-identification of smart transportation? IEEE Trans Intell Transp Syst 21(9):3766–3776
Han H, Zhou M, Shang X, Cao W, Abusorrah A (2021) KISS+ for rapid and accurate pedestrian re-identification. IEEE Trans Intell Transp Syst 22(1):394–403
He K, Zhang X, Ren S, Sun J (2016a) Deep residual learning for image recognition. In: CVPR, pp 770–778
He L, Hu D, Wan M, Wen Y, von Deneen KM, Zhou M (2016b) Common bayesian network for classification of eeg-based multiclass motor imagery BCI. IEEE Trans Syst Man Cybern Syst 46(6):843–854
Ho YC, Pepyne DL (2001) Simple explanation of the no free lunch theorem of optimization. In: Proceedings of the 40th ieee conference on decision and control, IEEE, vol 5, pp 4409–4414
Howard A, Pang R, Adam H, Le QV, Sandler M, Chen B, Wang W, Chen L, Tan M, Chu G, Vasudevan V, Zhu Y (2019) Searching for mobilenetv3. In: ICCV, pp 1314–1324
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR arxiv: abs/1704.04861
Huang G, Liu Z, van der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: CVPR, pp 2261–2269
Huang G, Liu S, van der Maaten L, Weinberger KQ (2018) Condensenet: an efficient densenet using learned group convolutions. In: CVPR, pp 2752–2761
Huang Z, Xu X, Ni J, Zhu H, Wang C (2019) Multimodal representation learning for recommendation in internet of things. IEEE Internet Things J 6(6):10675–10685
Huang Z, Xu X, Zhu H, Zhou M (2020) An efficient group recommendation model with multiattention-based neural networks. IEEE Trans Neural Netw Learn Syst 31(11):4461–4474
Iandola FN, Moskewicz MW, Ashraf K, Han S, Dally WJ, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and \(<\)1mb model size. CoRR arxiv: abs/1602.07360
Kang Q, Shi L, Zhou M, Wang X, Wu Q, Wei Z (2018) A distance-based weighted undersampling scheme for support vector machines and its application to imbalanced classification. IEEE Trans Neural Netw Learn Syst 29(9):4152–4165
Kazemi V, Sullivan J (2014) One millisecond face alignment with an ensemble of regression trees. In: CVPR, pp 1867–1874
King DE (2009) Dlib-ml: a machine learning toolkit. J Mach Learn Res 10:1755–1758
Ko B (2018) A brief review of facial emotion recognition based on visual information. Sensors 18(2):401
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: NIPS, pp 1106–1114
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Li S, Deng W (2019) Reliable crowdsourcing and deep locality-preserving learning for unconstrained facial expression recognition. IEEE Trans Image Process 28(1):356–370
Lin M, Chen Q, Yan S (2013) Network in network. arXiv preprint arXiv:13124400
Liu , Z et al (2017) A facial expression emotion recognition based human-robot interaction system. In IEEE/CAA J Automatica Sinica 4(4):668–676, 2017.
Liu H, Zhou M, Liu Q (2019) An embedded feature selection method for imbalanced data classification. IEEE CAA J Autom Sin 6(3):703–715
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, Springer, pp 21–37
Ma H, Celik T (2019) Fer-net: facial expression recognition using densely connected convolutional network. Electron Lett 55(4):184–186
Ma N, Zhang X, Zheng H, Sun J (2018) Shufflenet V2: practical guidelines for efficient CNN architecture design. ECCV 11218:122–138
Passalis N, Raitoharju J, Tefas A, Gabbouj M (2019) Adaptive inference using hierarchical convolutional bag-of-features for low-power embedded platforms. In: ICIP, pp 3048–3052
Riaz MN, Shen Y, Sohail M, Guo M (2020) exnet: an efficient approach for emotion recognition in the wild. Sensors 20(4):1087
Sahni Y, Cao J, Yang L (2019) Data-aware task allocation for achieving low latency in collaborative edge computing. IEEE Internet Things J 6(2):3512–3524
Sajjad M, Nasir M, Muhammad K, Khan S, Jan Z, Sangaiah AK, Elhoseny M, Baik SW (2020) Raspberry pi assisted face recognition framework for enhanced law-enforcement services in smart cities. Future Gener Comput Syst 108:995–1007
van de Sande KEA, Uijlings JRR, Gevers T, Smeulders AWM (2011) Segmentation as selective search for object recognition. In: ICCV, pp 1879–1886
Sandler M, Howard AG, Zhu M, Zhmoginov A, Chen L (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: CVPR, pp 4510–4520
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. CoRR arxiv: abs/1409.1556
Sun C, Vianney JMU, Li Y, Chen L, Li L, Wang F, Khajepour A, Cao D (2020) Proximity based automatic data annotation for autonomous driving. IEEE CAA J Autom Sin 7(2):395–404
Sun K, Li M, Liu D, Wang J (2018) IGCV3: interleaved low-rank group convolutions for efficient deep neural networks. In: BMVC, p 101
Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: CVPR, pp 1–9
Tan M, Le QV (2019a) Efficientnet: Rethinking model scaling for convolutional neural networks. ICML 97:6105–6114
Tan M, Le QV (2019b) Mixconv: Mixed depthwise convolutional kernels. In: BMVC, p 74
Walecki R, Rudovic O, Pavlovic V, Schuller BW, Pantic M (2017) Deep structured learning for facial action unit intensity estimation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, IEEE Computer Society, pp 5709–5718
Wang J, Bohn TA, Ling CX (2018) Pelee: A real-time object detection system on mobile devices. In: Advances in neural information processing systems 31: annual conference on neural information processing systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pp 1967–1976
Yang B, Yan J, Lei Z, Li SZ (2016) CRAFT objects from images. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, IEEE Computer Society, pp 6043–6051
Yang S, Gong Z, Ye K, Wei Y, Huang Z, Huang Z (2020) Edgernn: a compact speech recognition network with spatio-temporal features for edge computing. IEEE Access 8:81468–81478
Yuan H, Zhou M, Liu Q, Abusorrah A (2020) Fine-grained resource provisioning and task scheduling for heterogeneous applications in distributed green clouds. IEEE CAA J Autom Sin 7(5):1380–1393
Zaidan AA, Zaidan BB (2020) A review on intelligent process for smart home applications based on iot: coherent taxonomy, motivation, open challenges, and recommendations. Artif Intell Rev 53(1):141–165
Zhang H, Cissé M, Dauphin YN, Lopez-Paz D (2018a) mixup: Beyond empirical risk minimization. In: ICLR
Zhang J, Hu X, Ning Z, Ngai ECH, Zhou L, Wei J, Cheng J, Hu B (2017a) Energy-latency tradeoff for energy-aware offloading in mobile edge computing networks. IEEE Internet Things J 5(4):2633–2645
Zhang P, Zhou M, Fortino G (2018b) Security and trust issues in fog computing: a survey. Future Gener Comput Syst 88:16–27
Zhang T, Qi G, Xiao B, Wang J (2017b) Interleaved group convolutions. In: ICCV, pp 4383–4392
Zhang X, Zhou X, Lin M, Sun J (2018c) Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: CVPR, pp 6848–6856
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grant 61772366, Grant 62072192, and the Natural Science Foundation of Shanghai under Grant 17ZR1445900. The Deanship of Scientific Research (DSR) at King Abdulaziz University, Jeddah, Saudi Arabia has funded this project, under grant no. (FP-51-43)
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Huang, Z., Yang, S., Zhou, M. et al. Making accurate object detection at the edge: review and new approach. Artif Intell Rev 55, 2245–2274 (2022). https://doi.org/10.1007/s10462-021-10059-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-021-10059-3