Skip to main content
Log in

PDS-Net: A novel point and depth-wise separable convolution for real-time object detection

  • Regular Paper
  • Published:
International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Abstract

Numerous recent object detectors and classifiers have shown acceptable performance in recent years by using convolutional neural networks and other efficient architectures. However, most of them continue to encounter difficulties like overfitting, increased computational costs, and low efficiency and performance in real-time scenarios. This paper proposes a new lightweight model for detecting and classifying objects in images. This model presents a backbone for extracting in-depth features and a spatial feature pyramid network (SFPN) for accurately detecting and categorizing objects. The proposed backbone uses point-wise separable (PWS) and depth-wise separable convolutions, which are more efficient than standard convolution. The PWS convolution utilizes a residual shortcut link to reduce computation time. We also propose a SFPN that comprises concatenation, transformer encoder–decoder, and feature fusion modules, which enables the simultaneous processing of multi-scale features, the extraction of low-level characteristics, and the creation of a pyramid of features to increase the effectiveness of the proposed model. The proposed model outperforms all of the existing backbones for object detection and classification in three publicly accessible datasets: PASCAL VOC 2007, PASCAL VOC 2012, and MS-COCO. Our extensive experiments show that the proposed model outperforms state-of-the-art detectors, with mAP improvements of 2.4% and 2.5% on VOC 2007, 3.0% and 2.6% on VOC 2012, and 2.5% and 3.6% on MS-COCO in the small and large sizes of the images, respectively. In the MS-COCO dataset, our model achieves FPS of 39.4 and 33.1 in a single GPU for the small (\(320 \times 320\)) and large (\(512 \times 512\)) sizes of the images, respectively, which shows that our method can run in real-time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Ning J, Zhang L, Zhang D, Wu C (2009) Robust object tracking using joint color-texture histogram. Int J Pattern Recognit Artif Intell 23(07):1245–1263

    Article  Google Scholar 

  2. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1. IEEE, pp 886–893

  3. Mani MR, Potukuchi D, Satyanarayana C (2016) A novel approach for shape-based object recognition with curvelet transform. Int J Multimed Inf Retriev 5(4):219–228

    Article  Google Scholar 

  4. Adelson EH, Anderson CH, Bergen JR, Burt PJ, Ogden JM (1984) Pyramid methods in image processing. RCA Eng 29(6):33–41

    Google Scholar 

  5. Bastian BT, Jiji CV (2019) Pedestrian detection using first-and second-order aggregate channel features. Int J Multimed Inf Retriev 8(2):127–133

    Article  Google Scholar 

  6. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255

  7. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252

    Article  MathSciNet  Google Scholar 

  8. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the Kitti vision benchmark suite. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 3354–3361

  9. Everingham M, Eslami SA, Van Gool L, Williams CK, Winn J, Zisserman A (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vision 111(1):98–136

    Article  Google Scholar 

  10. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, pp 740–755

  11. Solovyev R, Wang W, Gabruseva T (2021) Weighted boxes fusion: ensembling boxes from different object detection models. Image Vis Comput 107:104117

    Article  Google Scholar 

  12. Shi C, Zhang W, Duan C, Chen H (2021) A pooling-based feature pyramid network for salient object detection. Image Vis Comput 107:104099

    Article  Google Scholar 

  13. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  14. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37

  15. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969

  16. Dai J, Li Y, He K, Sun J R-FCN: object detection via region-based fully convolutional networks. arXiv:1605.06409

  17. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  18. Redmon J, Farhadi A Yolov3: an incremental improvement. arXiv:1804.02767

  19. Soviany P, Ionescu RT (2018) Optimizing the trade-off between single-stage and two-stage deep object detectors using image difficulty prediction. In: 20th international symposium on symbolic and numeric algorithms for scientific computing (SYNASC). IEEE, pp 209–214

  20. Wu S, Li X, Wang X (2020) IOU-aware single-stage object detector for accurate localization. Image Vis Comput 97:103911

    Article  Google Scholar 

  21. Ren S, He K, Girshick R, Sun J Faster R-CNN: Towards real-time object detection with region proposal networks. arXiv:1506.01497

  22. Kong T, Sun F, Yao A, Liu H, Lu M, Chen Y (2017) Ron: Reverse connection with objectness prior networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5936–5944

  23. de Oliveira BAG, Ferreira FMF, da Silva Martins CAP (2018) Fast and lightweight object detection network: detection and recognition on resource constrained devices. IEEE Access 6:8714–8724

    Article  Google Scholar 

  24. Wang D, Chen X, Yi H, Zhao F (2019) Improvement of non-maximum suppression in RGB-D object detection. IEEE Access 7:144134–144143

    Article  Google Scholar 

  25. Bochkovskiy A, Wang C-Y, Liao H-YM Yolov4: Optimal speed and accuracy of object detection. arXiv:2004.10934

  26. Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258

  27. Ren S, He K, Girshick R, Sun J (2016) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  28. Li Y, Chen Y, Wang N, Zhang Z (2019) Scale-aware trident networks for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6054–6063

  29. Najibi M, Singh B, Davis LS (2019) Autofocus: efficient multi-scale inference. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9745–9755

  30. Bell S, Zitnick CL, Bala K, Girshick R (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883

  31. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

  32. Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 761–769

  33. Jiang B, Luo R, Mao J, Xiao T, Jiang Y (2018) Acquisition of localization confidence for accurate object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 784–799

  34. Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 845–853

  35. Cai Z, Fan Q, Feris R. S, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: European conference on computer vision. Springer, pp 354–370

  36. Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8759–8768

  37. Zhou P, Ni B, Geng C, Hu J, Xu Y (2018) Scale-transferrable object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 528–537

  38. Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2019) M2det: A single-shot object detector based on multi-level feature pyramid network. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 9259–9266

  39. Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV), pp 734–750

  40. Dai J, He K, Sun J (2016) Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3150–3158

  41. Zhao B, Feng J, Wu X, Yan S (2017) A survey on deep learning-based fine-grained object classification and semantic segmentation. Int J Autom Comput 14(2):119–135

    Article  Google Scholar 

  42. Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164

  43. Ghiasi G, Lin T-Y, Le QV (2019) Nas-fpn: learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7036–7045

  44. Xu H, Yao L, Zhang W, Liang X, Li Z (2019) Auto-FPN: automatic network architecture adaptation for object detection beyond classification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6649–6658

  45. Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10781–10790

  46. He W, Wu Y, Liang P, Hao G (2020) Using darts to improve mold id recognition model based on mask R-CNN. J Phys Conf Ser 1518:012042

    Article  Google Scholar 

  47. Huang Z, Huang L, Gong Y, Huang C, Wang X (2019) Mask scoring R-CNN. in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6409–6418

  48. Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271

  49. Farhadi A, Redmon J Yolov3: An incremental improvement. Comput Vis Pattern Recognit cite as

  50. Zhang Z, Qiao S, Xie C, Shen W, Wang B, Yuille AL (2018) Single-shot object detection with enriched semantics. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5813–5821

  51. Liu Y, Li H, Yan J, Wei F, Wang X, Tang X (2017) Recurrent scale approximation for object detection in CNN. In: Proceedings of the IEEE international conference on computer vision, pp 571–579

  52. Singh B, Davis LS (2018) An analysis of scale invariance in object detection snip. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3578–3587

  53. Wang G, Xiong Z, Liu D, Luo C (2018) Cascade mask generation framework for fast small object detection. In: 2018 IEEE international conference on multimedia and expo (ICME). IEEE, pp 1–6

  54. Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988

  55. Fu C-Y, Liu W, Ranga A, Tyagi A, Berg AC Dssd: deconvolutional single shot detector. arXiv:1701.06659

  56. Li S, Yang L, Huang J, Hua X-S, Zhang L (2019) Dynamic anchor feature selection for single-shot object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6609–6618

  57. Zhu Y, Zhao C, Wang J, Zhao X, Wu Y, Lu H (2017) Couplenet: coupling global structure with local parts for object detection. In: Proceedings of the IEEE international conference on computer vision, pp 4126–4134

  58. Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: object detection with keypoint triplets. arXiv:1904.08189

  59. Sun P, Zhang R, Jiang Y, Kong T, Xu C, Zhan W, Tomizuka M, Li L, Yuan Z, Wang C et al (2021) Sparse R-CNN: End-to-end object detection with learnable proposals. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14454–14463

  60. Li J, Cheng B, Feris R, Xiong J, Huang TS, Hwu W-M, Shi H (2021) Pseudo-IOU: improving label assignment in anchor-free object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2378–2387

  61. Li Y, Pang Y, Cao J, Shen J, Shao L (2021) Improving single shot object detection with feature scale unmixing. IEEE Trans Image Process 30:2708–2721

    Article  Google Scholar 

  62. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision. Springer, pp 213–229

  63. Li B, He Y (2018) An improved resnet based on the adjustable shortcut connections. IEEE Access 6:18967–18974

    Article  Google Scholar 

  64. Mahmood A, Bennamoun M, An S, Sohel F, Boussaid F (2020) Resfeats: residual network based features for underwater image classification. Image Vis Comput 93:103811

    Article  Google Scholar 

  65. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al, An image is worth 16x16 words: Transformers for image recognition at scale. arXiv:2010.11929

  66. Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (VOC) challenge. Int J Comput Vision 88(2):303–338

    Article  Google Scholar 

  67. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  68. Chen X, Li H, Wu Q, Meng F, Qiu H Bal-R2CNN: high quality recurrent object detection with balance optimization. IEEE Trans Multimed

  69. Aziz L, FC MSBHS, Ayub S (2021) Multi-level refinement enriched feature pyramid network for object detection. Image Visi Comput 115:104287

Download references

Acknowledgements

This work was supported in part by the Scientific and Technological Research Council of Turkey (TUBITAK) through the 2232 Outstanding International Researchers Program under Project No. 118C301.

Author information

Authors and Affiliations

Authors

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Junayed, M.S., Islam, M.B., Imani, H. et al. PDS-Net: A novel point and depth-wise separable convolution for real-time object detection. Int J Multimed Info Retr 11, 171–188 (2022). https://doi.org/10.1007/s13735-022-00229-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13735-022-00229-6

Keywords

Navigation