Skip to main content
Log in

Mask-guided SSD for small-object detection

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Detecting small objects is a challenging job for the single-shot multibox detector (SSD) model due to the limited information contained in features and complex background interference. Here, we increased the performance of the SSD for detecting target objects with small size by enhancing detection features with contextual information and introducing a segmentation mask to eliminate background regions. The proposed model is referred to as a “guided SSD” (Mask-SSD) and includes two branches: a detection branch and a segmentation branch. We created a feature-fusion module to allow the detection branch to exploit contextual information for feature maps with large resolution, with the segmentation branch primarily built with atrous convolution to provide additional contextual information to the detection branch. The input of the segmentation branch was also the output of the detection branch, and output segmentation features were fused with detection features in order to classify and locate target objects. Additionally, segmentation features were applied to generate the mask, which was utilized to guide the detection branch to find objects in potential foreground regions. Evaluation of Mask-SSD on the Tsinghua-Tencent 100K and Caltech pedestrian datasets demonstrated its effectiveness at detecting small objects and comparable performance relative to other state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Esmaeilzadeh S, Yang Y, Adeli E (2018) End-to-end parkinson disease diagnosis using brain mr-images by 3d-cnn. arXiv:180605233

  2. Lee S, Kim N, Jeong K, Park K, Paik J (2015) Moving object detection using unstable camera for video surveillance systems. Optik 126(20):2436–2441

    Article  Google Scholar 

  3. Dou J, Fang J, Li T, Xue J (2017) Boosting cnn-based pedestrian detection via 3d lidar fusion in autonomous driving. In: International conference on image and graphics, Springer, pp 3–13

  4. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  5. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, Berg A C (2016) Ssd: single shot multibox detector. In: European conference on computer vision, springer, pp 21–37

  6. Lin T Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988

  7. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  8. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  9. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  10. Everingham M, Van Gool L, Williams C K, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338

    Article  Google Scholar 

  11. Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C L (2014) Microsoft coco: common objects in context. In: European conference on computer vision, springer, pp 740–755

  12. Kisantal M, Wojna Z, Murawski J, Naruniec J, Cho K (2019) Augmentation for small object detection. arXiv:190207296

  13. Han C, Gao G, Zhang Y (2019) Real-time small traffic sign detection with revised faster-rcnn. Multimedia Tools and Applications 78(10):13263–13278

    Article  Google Scholar 

  14. Zhu Z, Liang D, Zhang S, Huang X, Li B, Hu S (2016) Traffic-sign detection and classification in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2110–2118

  15. Dollar P, Wojek C, Schiele B, Perona P (2011) Pedestrian detection: an evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761

    Article  Google Scholar 

  16. Bai Y, Zhang Y, Ding M, Ghanem B (2018) Sod-mtgan: small object detection via multi-task generative adversarial network. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 206–221

  17. Hu P, Ramanan D (2017) Finding tiny faces. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 951–959

  18. Fu CY, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: deconvolutional single shot detector. arXiv:170106659

  19. Zhang Z, Qiao S, Xie C, Shen W, Wang B, Yuille A L (2018) Single-shot object detection with enriched semantics. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5813–5821

  20. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969

  21. Wang G, Xiong Z, Liu D, Luo C (2018) Cascade mask generation framework for fast small object detection. In: 2018 IEEE international conference on multimedia and expo (ICME), IEEE, pp 1–6

  22. Chen L C, Papandreou G, Kokkinos I, Murphy K, Yuille A L (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  Google Scholar 

  23. Gao P, Zhang Q, Wang F, Xiao L, Fujita H, Zhang Y (2020) Learning reinforced attentional representation for end-to-end visual tracking. Inf Sci 517:52–67

    Article  Google Scholar 

  24. Gao P, Yuan R, Wang F, Xiao L, Fujita H, Zhang Y (2019) Siamese attentional keypoint network for high performance visual tracking. Knowledge-Based Systems p 105448

  25. Yang T, Zhang X, Li Z, Zhang W, Sun J (2018) Metaanchor: learning to detect objects with customized anchors. In: Advances in neural information processing systems, pp 320–330

  26. Liang X, Zhang J, Zhuo L, Li Y, Tian Q (2019) Small object detection in unmanned aerial vehicle images using feature fusion and scaling-based single shot detector with spatial context analysis. IEEE Transactions on Circuits and Systems for Video Technology

  27. Li Z, Peng C, Yu G, Zhang X, Deng Y, Sun J (2017) Light-head r-cnn: in defense of two-stage object detector. arXiv:171107264

  28. Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387

  29. Cheng B, Wei Y, Shi H, Feris R, Xiong J, Huang T (2018) Decoupled classification refinement: hard false positive suppression for object detection. arXiv:181004002

  30. Wang T, Zhang X, Yuan L, Feng J (2019) Few-shot adaptive faster r-cnn. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7173–7182

  31. Cao G, Xie X, Yang W, Liao Q, Shi G, Wu J (2018) Feature-fused ssd: fast detection for small objects. In: Ninth international conference on graphic and image processing (ICGIP 2017), international society for optics and photonics, vol 10615, p 106151e

  32. Xu M, Cui L, Lv P, Jiang X, Niu J, Zhou B, Wang M (2018) Mdssd: multi-scale deconvolutional single shot detector for small objects. arXiv:180507009

  33. Hu G X, Yang Z, Hu L, Huang L, Han J M (2018) Small object detection with multiscale features. International Journal of Digital Multimedia Broadcasting 2018

  34. Zheng L, Fu C, Zhao Y (2018) Extend the shallow part of single shot multibox detector via convolutional neural network. In: Tenth international conference on digital image processing (ICDIP 2018), international society for optics and photonics, vol 10806, p 1080613

  35. Liu Z, Li D, Sam Ge S, Tian F (2020) Small traffic sign detection from large image. Appl Intell 50:1–13

    Article  Google Scholar 

  36. Li J, Liang X, Wei Y, Xu T, Feng J, Yan S (2017) Perceptual generative adversarial networks for small object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1222–1230

  37. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

  38. Wu B, Iandola F, Jin P H, Keutzer K (2017) Squeezedet: unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 129–137

  39. Ashraf K, Wu B, Iandola FN, Moskewicz MW, Keutzer K (2016) Shallow networks for high-accuracy road object-detection. arXiv:160601561

  40. Xie H, Chen Y, Shin H (2019) Context-aware pedestrian detection especially for small-sized instances with deconvolution integrated faster rcnn (dif r-cnn). Appl Intell 49(3):1200–1211

    Article  Google Scholar 

  41. Zhang S, Benenson R, Omran M, Hosang J, Schiele B (2016) How far are we from solving pedestrian detection?. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1259–1267

  42. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:14091556

  43. Cai Z, Fan Q, Feris R S, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: European conference on computer vision, Springer, pp 354–370

  44. Li J, Liang X, Shen S, Xu T, Feng J, Yan S (2017) Scale-aware fast r-cnn for pedestrian detection. IEEE Trans Multimed 20(4):985–996

    Google Scholar 

  45. Zhang X, Cheng L, Li B, Hu H M (2018) Too far to see? Not really!—pedestrian detection with scale-aware localization policy. IEEE Trans Image Process 27(8):3703–3715

    Article  MathSciNet  Google Scholar 

  46. Zhang S, Yang J, Schiele B (2018) Occluded pedestrian detection through guided attention in cnns. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6995–7003

  47. Tian Y, Luo P, Wang X, Tang X (2015) Pedestrian detection aided by deep learning semantic tasks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5079–5087

  48. Brazil G, Liu X (2019) Pedestrian detection with autoregressive network phases. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7231–7240

  49. Pfeifer L (2019) Shearlet features for pedestrian detection. Journal of Mathematical Imaging and Vision 61(3):292–309

    Article  MathSciNet  Google Scholar 

  50. Yun I, Jung C, Wang X, Hero A O, Kim J K (2019) Part-level convolutional neural networks for pedestrian detection using saliency and boundary box alignment. IEEE Access 7:23027–23037

    Article  Google Scholar 

  51. Zhang S, Benenson R, Schiele B (2017) Citypersons: a diverse dataset for pedestrian detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3213–3221

  52. Wei W, Zhou B, Połap D, Woźniak M (2019) A regional adaptive variational pde model for computed tomography image reconstruction. Pattern Recogn 92:64–81

    Article  Google Scholar 

  53. Wei W, Xia X, Wozniak M, Fan X, Damaševičius R, Li Y (2019) Multi-sink distributed power control algorithm for cyber-physical-systems in coal mine tunnels. Comput Netw 161:210–219

    Article  Google Scholar 

  54. Wei W, Song H, Li W, Shen P, Vasilakos A (2017) Gradient-driven parking navigation using a continuous information potential field based on wireless sensor network. Inf Sci 408:100–114

    Article  Google Scholar 

Download references

Acknowledgments

The authors acknowledge funding from the Fundamental Research Funds for Central Universities of China (Nos. FRF-GF-18-009B and FRF-BD-19-001A) and the 111 Project (grant No. B12012).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weidong Zhang.

Ethics declarations

Conflict of interests

The authors declare there is no conflicts of interest regarding the publication of this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, C., Ai, Y., Wang, S. et al. Mask-guided SSD for small-object detection. Appl Intell 51, 3311–3322 (2021). https://doi.org/10.1007/s10489-020-01949-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-01949-0

Keywords

Navigation