Skip to main content
Log in

Making accurate object detection at the edge: review and new approach

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

With the development of Internet of Things (IoT), data are increasingly appearing at the edge of a network. Processing tasks at the network edge can effectively solve the problems of personal privacy leakage and server overloading. As a result, it has attracted a great deal of attention. A number of efficient convolutional neural network (CNN) models are proposed to do so. However, since they require much computing and memory resources, none of them can be deployed to such typical edge computing devices as Raspberry Pi 3B+ and 4B+ to meet the real-time requirements of user tasks. Considering that a traditional machine learning method can precisely locate an object with a highly acceptable calculation load, this work reviews state-of-the-art literature and then proposes a CNN with reduced input size for an object detection system that can be deployed in edge computing devices. It splits an object detection task into object positioning and classification. In particular, this work proposes a CNN model with 44 \(\times\) 44-pixel inputs instead of much more inputs, e.g., 224 \(\times\) 224-pixel in many existing methods, for edge computing devices with slow memory access and limited computing resources. Its overall performance has been verified via a facial expression detection system realized in Raspberry Pi 3B+ and 4B+. The work makes accurate object detection at the edge possible.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. https://github.com/tobysunx/face_recognition.

  2. https://github.com/yangshunzhi1994/CNN-RIS.

  3. OpenVINO\(^{\mathrm{TM}}\) toolkit: https://docs.openvinotoolkit.org/latest/index.html.

References

  • Ahmed SB, Ali SF, Ahmad J, Adnan M, Fraz MM (2020) On the frontiers of pose invariant face recognition: a review. Artif Intell Rev 53(4):2571–2634

    Article  Google Scholar 

  • Arndt S, Turvey C, Andreasen NC (1999) Correlating and predicting psychiatric symptom ratings: Spearmans r versus kendalls tau correlation. J Psychiatric Res 33(2):97–104

    Article  Google Scholar 

  • Bao J, Wei S, Lv J, Zhang W (2020) Optimized faster-RCNN in real-time facial expression classification. In: IOP Conference Series: Materials Science and Engineering, vol 790, pp 012148

  • Chang T, Wen G, Hu Y, Ma J (2018) Facial expression recognition based on complexity perception classification algorithm. arXiv preprint arXiv:180300185

  • Chen S, Li Q, Zhou M, Abusorrah A (2021) Recent advances in collaborative scheduling of computing tasks in an edge computing paradigm. Sensors. https://doi.org/10.3390/s21030779

    Article  Google Scholar 

  • Devries T, Taylor GW (2017) Improved regularization of convolutional neural networks with cutout. CoRR arxiv: abs/1708.04552

  • Eigen D, Rolfe JT, Fergus R, LeCun Y (2014) Understanding deep architectures using a recursive convolutional network. In: ICLR

  • Gholami A, Kwon K, Wu B, Tai Z, Yue X, Jin PH, Zhao S, Keutzer K (2018) Squeezenext: Hardware-aware neural network design. In: CVPR, pp 1638–1647

  • Gilad-Bachrach R, Dowlin N, Laine K, Lauter KE, Naehrig M, Wernsing J (2016) Cryptonets: applying neural networks to encrypted data with high throughput and accuracy. In: ICML vol 48, pp 201–210

  • Girma A, Bahadori N, Sarkar M, Tadewos TG, Behnia MR, Mahmoud MN, Karimoddini A, Homaifar A (2020) IoT-enabled autonomous system collaboration for disaster-area management. IEEE CAA J Autom Sin 7(1):1

    Article  MathSciNet  Google Scholar 

  • Goodfellow IJ, Erhan D, Carrier PL, Courville AC, Mirza M, Hamner B, Cukierski W, Tang Y, Thaler D, Lee D, Zhou Y, Ramaiah C, Feng F, Li R, Wang X, Athanasakis D, Shawe-Taylor J, Milakov M, Park J, Ionescu RT, Popescu M, Grozea C, Bergstra J, Xie J, Romaszko L, Xu B, Zhang C, Bengio Y (2013) Challenges in representation learning: a report on three machine learning contests. In: ICONIP vol 8228, pp 117–124

  • Han H, Zhou M, Zhang Y (2020) Can virtual samples solve small sample size problem of KISSME in pedestrian re-identification of smart transportation? IEEE Trans Intell Transp Syst 21(9):3766–3776

    Article  Google Scholar 

  • Han H, Zhou M, Shang X, Cao W, Abusorrah A (2021) KISS+ for rapid and accurate pedestrian re-identification. IEEE Trans Intell Transp Syst 22(1):394–403

    Article  Google Scholar 

  • He K, Zhang X, Ren S, Sun J (2016a) Deep residual learning for image recognition. In: CVPR, pp 770–778

  • He L, Hu D, Wan M, Wen Y, von Deneen KM, Zhou M (2016b) Common bayesian network for classification of eeg-based multiclass motor imagery BCI. IEEE Trans Syst Man Cybern Syst 46(6):843–854

    Article  Google Scholar 

  • Ho YC, Pepyne DL (2001) Simple explanation of the no free lunch theorem of optimization. In: Proceedings of the 40th ieee conference on decision and control, IEEE, vol 5, pp 4409–4414

  • Howard A, Pang R, Adam H, Le QV, Sandler M, Chen B, Wang W, Chen L, Tan M, Chu G, Vasudevan V, Zhu Y (2019) Searching for mobilenetv3. In: ICCV, pp 1314–1324

  • Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR arxiv: abs/1704.04861

  • Huang G, Liu Z, van der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: CVPR, pp 2261–2269

  • Huang G, Liu S, van der Maaten L, Weinberger KQ (2018) Condensenet: an efficient densenet using learned group convolutions. In: CVPR, pp 2752–2761

  • Huang Z, Xu X, Ni J, Zhu H, Wang C (2019) Multimodal representation learning for recommendation in internet of things. IEEE Internet Things J 6(6):10675–10685

    Article  Google Scholar 

  • Huang Z, Xu X, Zhu H, Zhou M (2020) An efficient group recommendation model with multiattention-based neural networks. IEEE Trans Neural Netw Learn Syst 31(11):4461–4474

    Article  MathSciNet  Google Scholar 

  • Iandola FN, Moskewicz MW, Ashraf K, Han S, Dally WJ, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and \(<\)1mb model size. CoRR arxiv: abs/1602.07360

  • Kang Q, Shi L, Zhou M, Wang X, Wu Q, Wei Z (2018) A distance-based weighted undersampling scheme for support vector machines and its application to imbalanced classification. IEEE Trans Neural Netw Learn Syst 29(9):4152–4165

    Article  Google Scholar 

  • Kazemi V, Sullivan J (2014) One millisecond face alignment with an ensemble of regression trees. In: CVPR, pp 1867–1874

  • King DE (2009) Dlib-ml: a machine learning toolkit. J Mach Learn Res 10:1755–1758

    Google Scholar 

  • Ko B (2018) A brief review of facial emotion recognition based on visual information. Sensors 18(2):401

    Article  Google Scholar 

  • Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: NIPS, pp 1106–1114

  • LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  • Li S, Deng W (2019) Reliable crowdsourcing and deep locality-preserving learning for unconstrained facial expression recognition. IEEE Trans Image Process 28(1):356–370

    Article  MathSciNet  Google Scholar 

  • Lin M, Chen Q, Yan S (2013) Network in network. arXiv preprint arXiv:13124400

  • Liu , Z et al (2017) A facial expression emotion recognition based human-robot interaction system. In IEEE/CAA J Automatica Sinica 4(4):668–676, 2017.

  • Liu H, Zhou M, Liu Q (2019) An embedded feature selection method for imbalanced data classification. IEEE CAA J Autom Sin 6(3):703–715

    Article  Google Scholar 

  • Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, Springer, pp 21–37

  • Ma H, Celik T (2019) Fer-net: facial expression recognition using densely connected convolutional network. Electron Lett 55(4):184–186

    Article  Google Scholar 

  • Ma N, Zhang X, Zheng H, Sun J (2018) Shufflenet V2: practical guidelines for efficient CNN architecture design. ECCV 11218:122–138

    Google Scholar 

  • Passalis N, Raitoharju J, Tefas A, Gabbouj M (2019) Adaptive inference using hierarchical convolutional bag-of-features for low-power embedded platforms. In: ICIP, pp 3048–3052

  • Riaz MN, Shen Y, Sohail M, Guo M (2020) exnet: an efficient approach for emotion recognition in the wild. Sensors 20(4):1087

    Article  Google Scholar 

  • Sahni Y, Cao J, Yang L (2019) Data-aware task allocation for achieving low latency in collaborative edge computing. IEEE Internet Things J 6(2):3512–3524

    Article  Google Scholar 

  • Sajjad M, Nasir M, Muhammad K, Khan S, Jan Z, Sangaiah AK, Elhoseny M, Baik SW (2020) Raspberry pi assisted face recognition framework for enhanced law-enforcement services in smart cities. Future Gener Comput Syst 108:995–1007

    Article  Google Scholar 

  • van de Sande KEA, Uijlings JRR, Gevers T, Smeulders AWM (2011) Segmentation as selective search for object recognition. In: ICCV, pp 1879–1886

  • Sandler M, Howard AG, Zhu M, Zhmoginov A, Chen L (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: CVPR, pp 4510–4520

  • Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. CoRR arxiv: abs/1409.1556

  • Sun C, Vianney JMU, Li Y, Chen L, Li L, Wang F, Khajepour A, Cao D (2020) Proximity based automatic data annotation for autonomous driving. IEEE CAA J Autom Sin 7(2):395–404

    Article  Google Scholar 

  • Sun K, Li M, Liu D, Wang J (2018) IGCV3: interleaved low-rank group convolutions for efficient deep neural networks. In: BMVC, p 101

  • Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: CVPR, pp 1–9

  • Tan M, Le QV (2019a) Efficientnet: Rethinking model scaling for convolutional neural networks. ICML 97:6105–6114

    Google Scholar 

  • Tan M, Le QV (2019b) Mixconv: Mixed depthwise convolutional kernels. In: BMVC, p 74

  • Walecki R, Rudovic O, Pavlovic V, Schuller BW, Pantic M (2017) Deep structured learning for facial action unit intensity estimation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, IEEE Computer Society, pp 5709–5718

  • Wang J, Bohn TA, Ling CX (2018) Pelee: A real-time object detection system on mobile devices. In: Advances in neural information processing systems 31: annual conference on neural information processing systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pp 1967–1976

  • Yang B, Yan J, Lei Z, Li SZ (2016) CRAFT objects from images. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, IEEE Computer Society, pp 6043–6051

  • Yang S, Gong Z, Ye K, Wei Y, Huang Z, Huang Z (2020) Edgernn: a compact speech recognition network with spatio-temporal features for edge computing. IEEE Access 8:81468–81478

    Article  Google Scholar 

  • Yuan H, Zhou M, Liu Q, Abusorrah A (2020) Fine-grained resource provisioning and task scheduling for heterogeneous applications in distributed green clouds. IEEE CAA J Autom Sin 7(5):1380–1393

    Google Scholar 

  • Zaidan AA, Zaidan BB (2020) A review on intelligent process for smart home applications based on iot: coherent taxonomy, motivation, open challenges, and recommendations. Artif Intell Rev 53(1):141–165

    Article  Google Scholar 

  • Zhang H, Cissé M, Dauphin YN, Lopez-Paz D (2018a) mixup: Beyond empirical risk minimization. In: ICLR

  • Zhang J, Hu X, Ning Z, Ngai ECH, Zhou L, Wei J, Cheng J, Hu B (2017a) Energy-latency tradeoff for energy-aware offloading in mobile edge computing networks. IEEE Internet Things J 5(4):2633–2645

    Article  Google Scholar 

  • Zhang P, Zhou M, Fortino G (2018b) Security and trust issues in fog computing: a survey. Future Gener Comput Syst 88:16–27

    Article  Google Scholar 

  • Zhang T, Qi G, Xiao B, Wang J (2017b) Interleaved group convolutions. In: ICCV, pp 4383–4392

  • Zhang X, Zhou X, Lin M, Sun J (2018c) Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: CVPR, pp 6848–6856

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant 61772366, Grant 62072192, and the Natural Science Foundation of Shanghai under Grant 17ZR1445900. The Deanship of Scientific Research (DSR) at King Abdulaziz University, Jeddah, Saudi Arabia has funded this project, under grant no. (FP-51-43)

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zhenhua Huang, MengChu Zhou or Zheng Gong.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, Z., Yang, S., Zhou, M. et al. Making accurate object detection at the edge: review and new approach. Artif Intell Rev 55, 2245–2274 (2022). https://doi.org/10.1007/s10462-021-10059-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-021-10059-3

Keywords

Navigation