Abstract
Recently, an overwhelming majority of object detection methods have focused on how to reduce the number of region proposals while keeping high object recall without consideration of category information. It may lead to a lot of false positives due to the interferences between categories especially when the number of categories is very large. To eliminate such interferences, we propose a novel category aggregation approach based upon our observation that more frequently detected categories around an object have the higher probabilities to be present in an image. After further exploiting the co-occurrence relationship between categories, we can determine the most possible categories for an image in advance. Thus, many false positives can be greatly filtered out before subsequent classification process. Our extensive experiments on the well-known ILSVRC 2015 detection dataset show that our approach can achieve 49.0% of mAP in the validation dataset and 45.36% of mAP in the test dataset ranked 5th in the ILSVRC 2015 detection task.
This work was supported by 863 Project (2014AA015202), National Nature Science Foundation of China (61572472), Beijing Natural Science Foundation (4152050) and Beijing Advanced Innovation Center for Imaging Technology (BAICIT-2016009).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Sermanet, P., Eigen, D., Zhang, X., et al.: Overfeat integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229 (2013)
Felzenszwalb, P.F., Girshick, R.B., McAllester, D.: Cascade object detection with deformable part models. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2241–2248. IEEE (2010)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Girshick, R., Donahue, J., Darrell, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Alexe, B., Deselaers, T., Ferrari, V.: What is an object? In: 2010 IEEE Conference on IEEE Computer Vision and Pattern Recognition (CVPR), pp. 73–80 (2010)
Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., et al.: Selective search for object recognition. Int. J. Comput. Vision 104(2), 154–171 (2013)
Cheng, M.M., Zhang, Z., Lin, W.Y., et al.: BING: binarized normed gradients for objectness estimation at 300fps. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3286–3293 (2014)
Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 391–405. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10602-1_26
Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Erhan, D., Szegedy, C., Toshev, A., et al.: Scalable object detection using deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2147–2154 (2014)
Arbelez, P., Pont-Tuset, J., Barron, J., et al.: Multiscale combinatorial grouping. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 328–335 (2014)
Qi, G.J., Hua, X.S., Rui, Y., et al.: Correlative multi-label video annotation. In: Proceedings of the 15th International Conference on Multimedia, pp. 17–26. ACM (2007)
Jiang, W., Chang, S.F., Loui, A.C.: Active context-based concept fusion with partial user labels. In: 2006 IEEE International Conference on Image Processing, pp. 2917–2920. IEEE (2006)
Gidaris, S., Komodakis, N.: Object detection via a multi-region and semantic segmentation-aware CNN model. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1134–1142 (2015)
Ouyang, W., Wang, X., Zeng, X., et al.: DeepID-net: deformable deep convolutional neural networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2403–2412 (2015)
He, K., Zhang, X., Ren, S., et al.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)
Weng, M.F., Chuang, Y.Y.: Multi-cue fusion for semantic video indexing. In: Proceedings of the 16th ACM International Conference on Multimedia, pp. 71–80. ACM (2008)
Choi, M.J., Lim, J.J., Torralba, A., et al.: Exploiting hierarchical context on a large database of object categories. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 129–136. IEEE (2010)
Galleguillos, C., Rabinovich, A., Belongie, S.: Object categorization using co-occurrence, location and appearance. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8. IEEE (2008)
Oquab, M., Bottou, L., Laptev, I., et al.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1717–1724 (2014)
Zheng, L., Wang, S., Liu, Z., et al.: Packing, padding: coupled multi-index for accurate image retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1939–1946 (2014)
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Li, L., Tang, S., Zhou, J., Wang, B., Tian, Q. (2016). Category Aggregation Among Region Proposals for Object Detection. In: Chen, E., Gong, Y., Tie, Y. (eds) Advances in Multimedia Information Processing - PCM 2016. PCM 2016. Lecture Notes in Computer Science(), vol 9917. Springer, Cham. https://doi.org/10.1007/978-3-319-48896-7_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-48896-7_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48895-0
Online ISBN: 978-3-319-48896-7
eBook Packages: Computer ScienceComputer Science (R0)