Hierarchical Object Detection and Classification Using SSD Multi-Loss

Zwemer, Matthijs H.; Wijnhoven, Rob G. J.; de With, Peter H. N.

doi:10.1007/978-3-030-94893-1_12

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1474))

Included in the following conference series:

International Joint Conference on Computer Vision, Imaging and Computer Graphics

912 Accesses
1 Citations

Abstract

When merging existing similar datasets, it would be attractive to benefit from a higher detection rate of objects and the additional partial ground-truth samples for improving object classification. To this end, a novel CNN detector with a hierarchical binary classification system is proposed. The detector is based on the Single-Shot multibox Detector (SSD) and inspired by the hierarchical classification used in the YOLO9000 detector. Localization and classification are separated during training, by introducing a novel loss term that handles hierarchical classification in the loss function (SSD-ML). We experiment with the proposed SSD-ML detector on the generic PASCAL VOC dataset and show that additional super-categories can be learned with minimal impact on the overall accuracy. Furthermore, we find that not all objects are required to have classification label information as classification performance only drops from \(73.3\%\) to \(70.6\%\) while \(60\%\) of the label information is removed. The flexibility of the detector with respect to the different levels of details in label definitions is investigated for a traffic surveillance application, involving public and proprietary datasets with non-overlapping class definitions. Including classification label information from our dataset raises the performance significantly from \(70.7\%\) to \(82.2\%\). The experiments show that the desired hierarchical labels can be learned from the public datasets, while only using box information from our dataset. In general, this shows that it is possible to combine existing datasets with similar object classes and partial annotations and benefit in terms of growth of detection rate and improved class categorization performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bertinetto, L., Mueller, R., Tertikas, K., Samangooei, S., Lord, N.A.: Making better mistakes: leveraging class hierarchies with deep networks. In: Proceedings of the IEEE/CVF CVPR (2020)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE CVPR, vol. 1, pp. 886–893. IEEE (2005)
Google Scholar
Everingham, M., et al.: The PASCAL visual object classes challenge 2012 (VOC2012) results (2012)
Google Scholar
Everingham, M., et al.: The PASCAL visual object classes challenge 2007 (VOC2007) results (2007)
Google Scholar
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE CVPR (2014)
Google Scholar
He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017)
Google Scholar
Kuznetsova, A., et al.: The open images dataset v4: unified image classification, object detection, and visual relationship detection at scale. IJCV (2020)
Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE ICCV, pp. 2980–2988 (2017)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Luo, Z., et al.: MIO-TCD: a new benchmark dataset for vehicle classification and localization. IEEE Trans. Image Proc. 27(10), 5129–5141 (2018). https://doi.org/10.1109/TIP.2018.2848705
Article MathSciNet Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR, pp. 779–788 (2016). https://doi.org/10.1109/CVPR.2016.91
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: CVPR, pp. 6517–6525 (2017). https://doi.org/10.1109/CVPR.2017.690
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE ICCV, pp. 9627–9636 (2019)
Google Scholar
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE CVPR (2001)
Google Scholar
Wen, L., et al.: UA-DETRAC: a new benchmark and protocol for multi-object detection and tracking. arXiv CoRR abs/1511.04136 (2015)
Google Scholar
Zou, Z., Shi, Z., Guo, Y., Ye, J.: Object detection in 20 years: a survey (2019)
Google Scholar
Zwemer., M.H., Wijnhoven., R.G.J., de With., P.H.N.: SSD-ML: hierarchical object classification for traffic surveillance. In: Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISAPP, vol. 5, pp. 250–259 (2020). https://doi.org/10.5220/0008902402500259

Download references

Author information

Authors and Affiliations

Eindhoven University of Technology, Eindhoven, The Netherlands
Matthijs H. Zwemer & Peter H. N. de With
ViNotion B.V., Eindhoven, The Netherlands
Matthijs H. Zwemer & Rob G. J. Wijnhoven

Authors

Matthijs H. Zwemer
View author publications
You can also search for this author in PubMed Google Scholar
Rob G. J. Wijnhoven
View author publications
You can also search for this author in PubMed Google Scholar
Peter H. N. de With
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthijs H. Zwemer .

Editor information

Editors and Affiliations

IRISA, University of Rennes 1, Rennes, France
Kadi Bouatouch
Universidade do Porto, Porto, Portugal
A. Augusto de Sousa
University of Genova, Genova, Italy
Manuela Chessa
Mines ParisTech, Paris, France
Alexis Paljic
Linnaeus University, Växjö, Sweden
Andreas Kerren
French Civil Aviation University (ENAC), Toulouse, France
Christophe Hurter
Università di Catania, Catania, Italy
Giovanni Maria Farinella
Universitat de Barcelona, Barcelona, Spain
Petia Radeva
Escola Superior de Tecnologia de Setúbal, Setúbal, Portugal
Jose Braz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zwemer, M.H., Wijnhoven, R.G.J., de With, P.H.N. (2022). Hierarchical Object Detection and Classification Using SSD Multi-Loss. In: Bouatouch, K., et al. Computer Vision, Imaging and Computer Graphics Theory and Applications. VISIGRAPP 2020. Communications in Computer and Information Science, vol 1474. Springer, Cham. https://doi.org/10.1007/978-3-030-94893-1_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-94893-1_12
Published: 22 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-94892-4
Online ISBN: 978-3-030-94893-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Hierarchical Object Detection and Classification Using SSD Multi-Loss