Skip to main content

Improving Object Detection with Consistent Negative Sample Mining

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11954))

Included in the following conference series:

Abstract

In object detection, training samples are divided into negatives and positives simply according to their initial positions on images. Samples which have low overlap with ground-truths are assigned to negatives, and positives otherwise. Once allocated, the negative and positive set are fixed in training. A usually overlooked issue is that certain negatives do not stick to their original states as training proceeds. They gradually regress towards foreground objects rather than away from them, which contradicts the nature of negatives. Training with such inconsistent negatives may confuse detectors in distinguishing between foreground and background, and thus makes training less effective. In this paper, we propose a consistent negative sample mining method to filter out biased negatives in training. Specifically, the neural network takes the regression performance into account, and dynamically activates consistent negatives which have both low input IoUs and low output IoUs for training. In the experiments, we evaluate our method on PASCAL VOC and KITTI datasets, and the improvements on both datasets demonstrate the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Our codes are based on https://github.com/pierluigiferrari/ssd_keras.

  2. 2.

    Softmax loss is used in both Initial and OHEM, thus Initial+CNSM can be compared with OHEM to evaluate the effectiveness of CNSM.

References

  1. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)

    Article  Google Scholar 

  2. Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1440–1448 (2015)

    Google Scholar 

  3. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: International Conference on Neural Information Processing Systems, vol. 39, pp. 91–99. MIT Press, Cambridge (2015)

    Google Scholar 

  4. Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  5. Li, Y., He, K., Sun, J., Dai, J.: R-FCN: object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, pp. 379–387 (2016)

    Google Scholar 

  6. Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)

    Google Scholar 

  7. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

    Google Scholar 

  8. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)

    Google Scholar 

  9. Yang, T., Zhang, X., Zhang, W., Sun, J.: MetaAnchor: learning to detect objects with customized anchors. In: International Conference on Neural Information Processing Systems (2018)

    Google Scholar 

  10. Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  11. Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)

    Google Scholar 

  12. He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

    Google Scholar 

  13. Li, Y., Chen, Y., Wang, N., Zhang, Z.: Scale-aware trident networks for object detection (2019). arXiv preprint arXiv:1901.01892

  14. Wang, X., Xiao, T., Jiang, Y., Shao, S., Sun, J., Shen, C.: Repulsion loss: detecting pedestrians in a crowd. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)

    Google Scholar 

  15. Brazil, G., Yin, X., Liu, X.: Illuminating pedestrians via simultaneous detection and segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4960–4969 (2017)

    Google Scholar 

  16. Wang, X., Jung, C., Hero, A.O.: Part-level fully convolutional networks for pedestrian detection. In: International Conference on Acoustics, Speech and Signal Processing, pp. 2267–2271 (2017)

    Google Scholar 

  17. Fu, C.-Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD: deconvolutional single shot detector (2017). arXiv preprint arXiv:1701.06659

  18. Zhou, P., Ni, B., Geng, C., Hu, J., Xu, Y.: Scale-transferrable object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)

    Google Scholar 

  19. Du, X., El-Khamy, M., Lee, J., Davis, L.: Fused DNN: a deep neural network fusion approach to fast and robust pedestrian detection. In: Applications of Computer Vision, pp. 953–961 (2017)

    Google Scholar 

  20. Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4203–4212 (2018)

    Google Scholar 

  21. Kong, T., Yao, A., Chen, Y., Sun, F.: HyperNet: towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 845–853 (2016)

    Google Scholar 

  22. Kong, T., Sun, F., Yao, A., Liu, H., Lu, M., Chen, Y.: RON: reverse connection with objectness prior networks for Object Detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  23. He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Proceedings of the European Conference on Computer Vision, pp. 346–361 (2014)

    Chapter  Google Scholar 

  24. Lin, T., Dollr, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  25. Li, Z., Peng, C., Yu, G., Zhang, X., Deng, Y., Sun, J.: Light-head R-CNN: in defense of two-stage object detector (2017). arXiv preprint arXiv:1711.07264

  26. Rao, Y., Lin, D., Lu, J., Zhou, J.: Learning globally optimized object detector via policy gradient. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)

    Google Scholar 

  27. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2012)

    Google Scholar 

  28. Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. J. Int. J. Comput. Vis. 88(2), 303–338 (2010)

    Article  Google Scholar 

  29. Liu, W., Liao, S., Hu, W., Liang, X., Chen, X.: Learning efficient single-stage pedestrian detectors by asymptotic localization fitting. In: Proceedings of the European Conference on Computer Vision (2018)

    Google Scholar 

  30. Lin, T., et al.: Microsoft COCO: common objects in context. In: Proceedings of the European Conference on Computer Vision, pp. 740–755 (2014)

    Chapter  Google Scholar 

  31. Shen, Z., Liu, Z., Li, J., Jiang, Y., Chen, Y., Xue, X.: DSOD: learning deeply supervised object detectors from scratch. In: Proceedings of the IEEE International Conference on Computer Vision (2017)

    Google Scholar 

  32. Zhang, Z., Qiao, S., Xie, C., Shen, W., Wang, B., Yuille, A.L.: Single-shot object detection with enriched semantics. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)

    Google Scholar 

  33. Chollet, F.: Keras (2015). https://github.com/fchollet/keras

  34. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  35. Li, Z., Chen, Y., Yu, G., Deng, Y.: R-FCN++: towards accurate region-based fully convolutional networks for object detection. In: AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

Download references

Acknowledgments

All correspondences should be forwarded to Chen Chen, the corresponding author, via chen.chen@ia.ac.cn. This work was supported by the National Science Foundation of China under Grant NSFC 61906194.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chen Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, X., Hu, X., Chen, C., Fan, Z., Peng, S. (2019). Improving Object Detection with Consistent Negative Sample Mining. In: Gedeon, T., Wong, K., Lee, M. (eds) Neural Information Processing. ICONIP 2019. Lecture Notes in Computer Science(), vol 11954. Springer, Cham. https://doi.org/10.1007/978-3-030-36711-4_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-36711-4_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-36710-7

  • Online ISBN: 978-3-030-36711-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics