Category Aggregation Among Region Proposals for Object Detection

Li, Linghui; Tang, Sheng; Zhou, Jianshe; Wang, Bin; Tian, Qi

doi:10.1007/978-3-319-48896-7_21

Linghui Li¹⁶,
Sheng Tang¹⁶,
Jianshe Zhou¹⁷,
Bin Wang¹⁶ &
…
Qi Tian¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9917))

Included in the following conference series:

Pacific Rim Conference on Multimedia

2583 Accesses

Abstract

Recently, an overwhelming majority of object detection methods have focused on how to reduce the number of region proposals while keeping high object recall without consideration of category information. It may lead to a lot of false positives due to the interferences between categories especially when the number of categories is very large. To eliminate such interferences, we propose a novel category aggregation approach based upon our observation that more frequently detected categories around an object have the higher probabilities to be present in an image. After further exploiting the co-occurrence relationship between categories, we can determine the most possible categories for an image in advance. Thus, many false positives can be greatly filtered out before subsequent classification process. Our extensive experiments on the well-known ILSVRC 2015 detection dataset show that our approach can achieve 49.0% of mAP in the validation dataset and 45.36% of mAP in the test dataset ranked 5th in the ILSVRC 2015 detection task.

This work was supported by 863 Project (2014AA015202), National Nature Science Foundation of China (61572472), Beijing Natural Science Foundation (4152050) and Beijing Advanced Innovation Center for Imaging Technology (BAICIT-2016009).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Fine-Grained Visual Classification Based on Image Foreground and Sub-category Similarity

Large-Scale R-CNN with Classifier Adaptive Quantization

A Simple Approach and Benchmark for 21,000-Category Object Detection

References

Sermanet, P., Eigen, D., Zhang, X., et al.: Overfeat integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229 (2013)
Felzenszwalb, P.F., Girshick, R.B., McAllester, D.: Cascade object detection with deformable part models. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2241–2248. IEEE (2010)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Girshick, R., Donahue, J., Darrell, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Google Scholar
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Google Scholar
Alexe, B., Deselaers, T., Ferrari, V.: What is an object? In: 2010 IEEE Conference on IEEE Computer Vision and Pattern Recognition (CVPR), pp. 73–80 (2010)
Google Scholar
Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., et al.: Selective search for object recognition. Int. J. Comput. Vision 104(2), 154–171 (2013)
Article Google Scholar
Cheng, M.M., Zhang, Z., Lin, W.Y., et al.: BING: binarized normed gradients for objectness estimation at 300fps. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3286–3293 (2014)
Google Scholar
Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 391–405. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10602-1_26
Google Scholar
Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Erhan, D., Szegedy, C., Toshev, A., et al.: Scalable object detection using deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2147–2154 (2014)
Google Scholar
Arbelez, P., Pont-Tuset, J., Barron, J., et al.: Multiscale combinatorial grouping. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 328–335 (2014)
Google Scholar
Qi, G.J., Hua, X.S., Rui, Y., et al.: Correlative multi-label video annotation. In: Proceedings of the 15th International Conference on Multimedia, pp. 17–26. ACM (2007)
Google Scholar
Jiang, W., Chang, S.F., Loui, A.C.: Active context-based concept fusion with partial user labels. In: 2006 IEEE International Conference on Image Processing, pp. 2917–2920. IEEE (2006)
Google Scholar
Gidaris, S., Komodakis, N.: Object detection via a multi-region and semantic segmentation-aware CNN model. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1134–1142 (2015)
Google Scholar
Ouyang, W., Wang, X., Zeng, X., et al.: DeepID-net: deformable deep convolutional neural networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2403–2412 (2015)
Google Scholar
He, K., Zhang, X., Ren, S., et al.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)
Article Google Scholar
Weng, M.F., Chuang, Y.Y.: Multi-cue fusion for semantic video indexing. In: Proceedings of the 16th ACM International Conference on Multimedia, pp. 71–80. ACM (2008)
Google Scholar
Choi, M.J., Lim, J.J., Torralba, A., et al.: Exploiting hierarchical context on a large database of object categories. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 129–136. IEEE (2010)
Google Scholar
Galleguillos, C., Rabinovich, A., Belongie, S.: Object categorization using co-occurrence, location and appearance. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8. IEEE (2008)
Google Scholar
Oquab, M., Bottou, L., Laptev, I., et al.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1717–1724 (2014)
Google Scholar
Zheng, L., Wang, S., Liu, Z., et al.: Packing, padding: coupled multi-index for accurate image retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1939–1946 (2014)
Google Scholar
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015)

Download references

Author information

Authors and Affiliations

Key Lab of Intelligent Information Processing, Institute of Computing Technology, CAS, Beijing, 100190, China
Linghui Li, Sheng Tang & Bin Wang
Beijing Advanced Innovation Center for Imaging Technology, Capital Normal University, Beijing, 100048, People’s Republic of China
Jianshe Zhou
Department of Computer Science, University of Texas at San Antonio, San Antonio, Texas, 78249-1604, USA
Qi Tian

Authors

Linghui Li
View author publications
You can also search for this author in PubMed Google Scholar
Sheng Tang
View author publications
You can also search for this author in PubMed Google Scholar
Jianshe Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Bin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qi Tian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sheng Tang .

Editor information

Editors and Affiliations

Zhengzhou University, Zhengzhou, China
Enqing Chen
Jiaotong University, Xi’an, China
Yihong Gong
Zhengzhou University, Zhengzhou, China
Yun Tie

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, L., Tang, S., Zhou, J., Wang, B., Tian, Q. (2016). Category Aggregation Among Region Proposals for Object Detection. In: Chen, E., Gong, Y., Tie, Y. (eds) Advances in Multimedia Information Processing - PCM 2016. PCM 2016. Lecture Notes in Computer Science(), vol 9917. Springer, Cham. https://doi.org/10.1007/978-3-319-48896-7_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-48896-7_21
Published: 27 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48895-0
Online ISBN: 978-3-319-48896-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics