Improving Object Detection with Consistent Negative Sample Mining

Wang, Xiaolian; Hu, Xiyuan; Chen, Chen; Fan, Zhenfeng; Peng, Silong

doi:10.1007/978-3-030-36711-4_21

Xiaolian Wang^11,12,
Xiyuan Hu^11,12,
Chen Chen^11,12,
Zhenfeng Fan^11,12 &
…
Silong Peng^11,12,13

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11954))

Included in the following conference series:

International Conference on Neural Information Processing

1881 Accesses
1 Citations

Abstract

In object detection, training samples are divided into negatives and positives simply according to their initial positions on images. Samples which have low overlap with ground-truths are assigned to negatives, and positives otherwise. Once allocated, the negative and positive set are fixed in training. A usually overlooked issue is that certain negatives do not stick to their original states as training proceeds. They gradually regress towards foreground objects rather than away from them, which contradicts the nature of negatives. Training with such inconsistent negatives may confuse detectors in distinguishing between foreground and background, and thus makes training less effective. In this paper, we propose a consistent negative sample mining method to filter out biased negatives in training. Specifically, the neural network takes the regression performance into account, and dynamically activates consistent negatives which have both low input IoUs and low output IoUs for training. In the experiments, we evaluate our method on PASCAL VOC and KITTI datasets, and the improvements on both datasets demonstrate the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Our codes are based on https://github.com/pierluigiferrari/ssd_keras.
2.
Softmax loss is used in both Initial and OHEM, thus Initial+CNSM can be compared with OHEM to evaluate the effectiveness of CNSM.

References

LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Article Google Scholar
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1440–1448 (2015)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: International Conference on Neural Information Processing Systems, vol. 39, pp. 91–99. MIT Press, Cambridge (2015)
Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Li, Y., He, K., Sun, J., Dai, J.: R-FCN: object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, pp. 379–387 (2016)
Google Scholar
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Google Scholar
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
Google Scholar
Yang, T., Zhang, X., Zhang, W., Sun, J.: MetaAnchor: learning to detect objects with customized anchors. In: International Conference on Neural Information Processing Systems (2018)
Google Scholar
Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Google Scholar
Li, Y., Chen, Y., Wang, N., Zhang, Z.: Scale-aware trident networks for object detection (2019). arXiv preprint arXiv:1901.01892
Wang, X., Xiao, T., Jiang, Y., Shao, S., Sun, J., Shen, C.: Repulsion loss: detecting pedestrians in a crowd. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar
Brazil, G., Yin, X., Liu, X.: Illuminating pedestrians via simultaneous detection and segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4960–4969 (2017)
Google Scholar
Wang, X., Jung, C., Hero, A.O.: Part-level fully convolutional networks for pedestrian detection. In: International Conference on Acoustics, Speech and Signal Processing, pp. 2267–2271 (2017)
Google Scholar
Fu, C.-Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD: deconvolutional single shot detector (2017). arXiv preprint arXiv:1701.06659
Zhou, P., Ni, B., Geng, C., Hu, J., Xu, Y.: Scale-transferrable object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar
Du, X., El-Khamy, M., Lee, J., Davis, L.: Fused DNN: a deep neural network fusion approach to fast and robust pedestrian detection. In: Applications of Computer Vision, pp. 953–961 (2017)
Google Scholar
Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4203–4212 (2018)
Google Scholar
Kong, T., Yao, A., Chen, Y., Sun, F.: HyperNet: towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 845–853 (2016)
Google Scholar
Kong, T., Sun, F., Yao, A., Liu, H., Lu, M., Chen, Y.: RON: reverse connection with objectness prior networks for Object Detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Proceedings of the European Conference on Computer Vision, pp. 346–361 (2014)
Chapter Google Scholar
Lin, T., Dollr, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Google Scholar
Li, Z., Peng, C., Yu, G., Zhang, X., Deng, Y., Sun, J.: Light-head R-CNN: in defense of two-stage object detector (2017). arXiv preprint arXiv:1711.07264
Rao, Y., Lin, D., Lu, J., Zhou, J.: Learning globally optimized object detector via policy gradient. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2012)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. J. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Article Google Scholar
Liu, W., Liao, S., Hu, W., Liang, X., Chen, X.: Learning efficient single-stage pedestrian detectors by asymptotic localization fitting. In: Proceedings of the European Conference on Computer Vision (2018)
Google Scholar
Lin, T., et al.: Microsoft COCO: common objects in context. In: Proceedings of the European Conference on Computer Vision, pp. 740–755 (2014)
Chapter Google Scholar
Shen, Z., Liu, Z., Li, J., Jiang, Y., Chen, Y., Xue, X.: DSOD: learning deeply supervised object detectors from scratch. In: Proceedings of the IEEE International Conference on Computer Vision (2017)
Google Scholar
Zhang, Z., Qiao, S., Xie, C., Shen, W., Wang, B., Yuille, A.L.: Single-shot object detection with enriched semantics. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar
Chollet, F.: Keras (2015). https://github.com/fchollet/keras
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Li, Z., Chen, Y., Yu, G., Deng, Y.: R-FCN++: towards accurate region-based fully convolutional networks for object detection. In: AAAI Conference on Artificial Intelligence (2018)
Google Scholar

Download references

Acknowledgments

All correspondences should be forwarded to Chen Chen, the corresponding author, via chen.chen@ia.ac.cn. This work was supported by the National Science Foundation of China under Grant NSFC 61906194.

Author information

Authors and Affiliations

Institute of Automation, Chinese Academy of Sciences, Beijing, China
Xiaolian Wang, Xiyuan Hu, Chen Chen, Zhenfeng Fan & Silong Peng
University of Chinese Academy of Sciences, Beijing, China
Xiaolian Wang, Xiyuan Hu, Chen Chen, Zhenfeng Fan & Silong Peng
Beijing ViSystem Corporation Limited, Beijing, China
Silong Peng

Authors

Xiaolian Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiyuan Hu
View author publications
You can also search for this author in PubMed Google Scholar
Chen Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhenfeng Fan
View author publications
You can also search for this author in PubMed Google Scholar
Silong Peng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chen Chen .

Editor information

Editors and Affiliations

Australian National University, Canberra, ACT, Australia
Tom Gedeon
Murdoch University, Murdoch, WA, Australia
Kok Wai Wong
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, X., Hu, X., Chen, C., Fan, Z., Peng, S. (2019). Improving Object Detection with Consistent Negative Sample Mining. In: Gedeon, T., Wong, K., Lee, M. (eds) Neural Information Processing. ICONIP 2019. Lecture Notes in Computer Science(), vol 11954. Springer, Cham. https://doi.org/10.1007/978-3-030-36711-4_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-36711-4_21
Published: 09 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36710-7
Online ISBN: 978-3-030-36711-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics