Abstract
In object detection, training samples are divided into negatives and positives simply according to their initial positions on images. Samples which have low overlap with ground-truths are assigned to negatives, and positives otherwise. Once allocated, the negative and positive set are fixed in training. A usually overlooked issue is that certain negatives do not stick to their original states as training proceeds. They gradually regress towards foreground objects rather than away from them, which contradicts the nature of negatives. Training with such inconsistent negatives may confuse detectors in distinguishing between foreground and background, and thus makes training less effective. In this paper, we propose a consistent negative sample mining method to filter out biased negatives in training. Specifically, the neural network takes the regression performance into account, and dynamically activates consistent negatives which have both low input IoUs and low output IoUs for training. In the experiments, we evaluate our method on PASCAL VOC and KITTI datasets, and the improvements on both datasets demonstrate the effectiveness of our method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Our codes are based on https://github.com/pierluigiferrari/ssd_keras.
- 2.
Softmax loss is used in both Initial and OHEM, thus Initial+CNSM can be compared with OHEM to evaluate the effectiveness of CNSM.
References
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1440–1448 (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: International Conference on Neural Information Processing Systems, vol. 39, pp. 91–99. MIT Press, Cambridge (2015)
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Li, Y., He, K., Sun, J., Dai, J.: R-FCN: object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, pp. 379–387 (2016)
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
Yang, T., Zhang, X., Zhang, W., Sun, J.: MetaAnchor: learning to detect objects with customized anchors. In: International Conference on Neural Information Processing Systems (2018)
Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Li, Y., Chen, Y., Wang, N., Zhang, Z.: Scale-aware trident networks for object detection (2019). arXiv preprint arXiv:1901.01892
Wang, X., Xiao, T., Jiang, Y., Shao, S., Sun, J., Shen, C.: Repulsion loss: detecting pedestrians in a crowd. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Brazil, G., Yin, X., Liu, X.: Illuminating pedestrians via simultaneous detection and segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4960–4969 (2017)
Wang, X., Jung, C., Hero, A.O.: Part-level fully convolutional networks for pedestrian detection. In: International Conference on Acoustics, Speech and Signal Processing, pp. 2267–2271 (2017)
Fu, C.-Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD: deconvolutional single shot detector (2017). arXiv preprint arXiv:1701.06659
Zhou, P., Ni, B., Geng, C., Hu, J., Xu, Y.: Scale-transferrable object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Du, X., El-Khamy, M., Lee, J., Davis, L.: Fused DNN: a deep neural network fusion approach to fast and robust pedestrian detection. In: Applications of Computer Vision, pp. 953–961 (2017)
Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4203–4212 (2018)
Kong, T., Yao, A., Chen, Y., Sun, F.: HyperNet: towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 845–853 (2016)
Kong, T., Sun, F., Yao, A., Liu, H., Lu, M., Chen, Y.: RON: reverse connection with objectness prior networks for Object Detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Proceedings of the European Conference on Computer Vision, pp. 346–361 (2014)
Lin, T., Dollr, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Li, Z., Peng, C., Yu, G., Zhang, X., Deng, Y., Sun, J.: Light-head R-CNN: in defense of two-stage object detector (2017). arXiv preprint arXiv:1711.07264
Rao, Y., Lin, D., Lu, J., Zhou, J.: Learning globally optimized object detector via policy gradient. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2012)
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. J. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Liu, W., Liao, S., Hu, W., Liang, X., Chen, X.: Learning efficient single-stage pedestrian detectors by asymptotic localization fitting. In: Proceedings of the European Conference on Computer Vision (2018)
Lin, T., et al.: Microsoft COCO: common objects in context. In: Proceedings of the European Conference on Computer Vision, pp. 740–755 (2014)
Shen, Z., Liu, Z., Li, J., Jiang, Y., Chen, Y., Xue, X.: DSOD: learning deeply supervised object detectors from scratch. In: Proceedings of the IEEE International Conference on Computer Vision (2017)
Zhang, Z., Qiao, S., Xie, C., Shen, W., Wang, B., Yuille, A.L.: Single-shot object detection with enriched semantics. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Chollet, F.: Keras (2015). https://github.com/fchollet/keras
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Li, Z., Chen, Y., Yu, G., Deng, Y.: R-FCN++: towards accurate region-based fully convolutional networks for object detection. In: AAAI Conference on Artificial Intelligence (2018)
Acknowledgments
All correspondences should be forwarded to Chen Chen, the corresponding author, via chen.chen@ia.ac.cn. This work was supported by the National Science Foundation of China under Grant NSFC 61906194.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, X., Hu, X., Chen, C., Fan, Z., Peng, S. (2019). Improving Object Detection with Consistent Negative Sample Mining. In: Gedeon, T., Wong, K., Lee, M. (eds) Neural Information Processing. ICONIP 2019. Lecture Notes in Computer Science(), vol 11954. Springer, Cham. https://doi.org/10.1007/978-3-030-36711-4_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-36711-4_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36710-7
Online ISBN: 978-3-030-36711-4
eBook Packages: Computer ScienceComputer Science (R0)