CDSSD: Refreshing Single Shot Object Detection Using a Conv-Deconv Network

Gabale, Vijay; Sawant, Uma

doi:10.1007/978-3-319-93040-4_25

Vijay Gabale¹⁹ &
Uma Sawant²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10939))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

3397 Accesses

Abstract

Single shot multi-box object detectors [13] have been recently shown to achieve state-of-the-art performance on object detection tasks. We extend the single shot detection (SSD) framework in [13] and propose a generic architecture using a deep convolution-deconvolution network. Our architecture does not rely on any pretrained network, and can be pretrained in an unsupervised manner for a given image dataset. Furthermore, we propose a novel approach to combine feature maps from both convolution and deconvolution layers to predict bounding boxes and labels with improved accuracy. Our framework, Conv-Deconv SSD (CDSSD), with its two key contributions – unsupervised pretraining and multi-layer confluence of convolution-deconvolution feature maps – results in state-of-the-art performance while utilizing significantly less number of bounding boxes and improved identification of small objects. On \(300 \times 300\) image inputs, we achieve 80.7% mAP on VOC07 and 78.1% mAP on VOC07+12 (1.7% to 2.8% improvement over StairNet [21], DSSD [5], SSD [13]). CDSSD achieves 30.2% mAP on COCO performing at-par with R-FCN [3] and faster-R-FCN [18], while working on smaller size input images. Furthermore, CDSSD matches SSD performance while utilizing 82% of data, and reduces the prediction time per image by 10%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Our network is not symmetric. During deconvolution, we simply apply learned upsampling and learned deconvolution without residual blocks.
2.
Due to reduced batch size, the number of batches or iterations are increased as compared to the original SSD work.
3.
Details omitted due to lack of space.

References

Bell, S., Zitnick, C.L., Bala, K., Girshick, R.: Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: CVPR (2016)
Google Scholar
Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 354–370. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_22
Chapter Google Scholar
Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. CoRR abs/1605.06409 (2016)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. IJCV 88(2), 303–338 (2010)
Article Google Scholar
Fu, C., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD: deconvolutional single shot detector. CoRR abs/1701.06659 (2017)
Google Scholar
Girshick, R.: Fast R-CNN. In: ICCV (2015)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. CoRR abs/1406.4729 (2014)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015)
Google Scholar
Jeong, J., Park, H., Kwak, N.: Enhancement of SSD by concatenating feature maps for object detection. CoRR abs/1705.09587 (2017)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR (2017)
Google Scholar
Lin, T., Maire, M., Belongie, S.J., Bourdev, L.D., Girshick, R.B., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. CoRR abs/1405.0312 (2014)
Google Scholar
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Luo, W., Li, Y., Urtasun, R., Zemel, R.S.: Understanding the effective receptive field in deep convolutional neural networks. CoRR abs/1701.04128 (2017)
Google Scholar
Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. CoRR abs/1505.04366 (2015)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: CVPR (2016)
Google Scholar
Ren, J.S.J., Chen, X., Liu, J., Sun, W., Pang, J., Yan, Q., Tai, Y.W., Xu, L.: Accurate single stage detector using recurrent rolling convolution. In: CVPR (2017)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)
Google Scholar
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., Lecun, Y.: OverFeat: integrated recognition, localization and detection using convolutional networks. In: ICLR (2014)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)
Google Scholar
Woo, S., Hwang, S., Kweon, I.S.: StairNet: top-down semantic aggregation for accurate one shot detection. CoRR abs/1709.05788 (2017)
Google Scholar
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. CoRR abs/1511.07122 (2015)
Google Scholar
Zhou, B., Khosla, A., Lapedriza, À., Oliva, A., Torralba, A.: Object detectors emerge in deep scene CNNs. CoRR abs/1412.6856 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Huew, Bangalore, India
Vijay Gabale
IIT Bombay, Mumbai, India
Uma Sawant

Authors

Vijay Gabale
View author publications
You can also search for this author in PubMed Google Scholar
Uma Sawant
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vijay Gabale .

Editor information

Editors and Affiliations

Deakin University, Geelong, Victoria, Australia
Dinh Phung
National Chiao Tung University, Hsinchu City, Taiwan
Vincent S. Tseng
Monash University, Clayton, Victoria, Australia
Geoffrey I. Webb
Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
Bao Ho
University of Melbourne, Melbourne, Victoria, Australia
Mohadeseh Ganji
University of Melbourne, Melbourne, Victoria, Australia
Lida Rashidi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gabale, V., Sawant, U. (2018). CDSSD: Refreshing Single Shot Object Detection Using a Conv-Deconv Network. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10939. Springer, Cham. https://doi.org/10.1007/978-3-319-93040-4_25

Download citation

DOI: https://doi.org/10.1007/978-3-319-93040-4_25
Published: 17 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93039-8
Online ISBN: 978-3-319-93040-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics