A multi-phase blending method with incremental intensity for training detection networks

Quan, Quan; He, Fazhi; Li, Haoran

doi:10.1007/s00371-020-01796-7

A multi-phase blending method with incremental intensity for training detection networks

Original Article
Published: 27 January 2020

Volume 37, pages 245–259, (2021)
Cite this article

The Visual Computer Aims and scope Submit manuscript

843 Accesses
55 Citations
Explore all metrics

Abstract

Object detection is an important topic for visual data processing in the visual computing area. Although a number of approaches have been studied, it still remains a challenge. There is a suitable way to promote image classifiers by blending training with blended images and corresponding blended labels. However, our experiments show that directly moving existing blending methods from classification to object detection will cause the training process become harder and eventually will lead to a bad performance. Inspired by our discovery, this paper presents a multi-phase blending method with incremental blending intensity to improve the accuracy of object detectors and achieve remarkable improvements. Firstly, to adapt blending method to detection task, we propose a smoothly scheduled and incremental blending intensity to control the degree of multi-phase blending. Based on the above dynamic coefficient, we propose an incremental blending method, in which the blending intensity is smoothly increased from zero to full. Therefore, more complex and various data can be created to achieve the goal of regularizing the network. Secondly, we also design an incremental hybrid loss function to replace the original loss function. The blending intensity in our loss function increases smoothly, which is controlled by our scheduled coefficient. Thirdly, we further discard more negative examples in our multi-phase training process than other typical training methods and processes. By doing so, we can regularize the neural network to enhance generalization capability with data diversity and eventually to improve the accuracy in object detection. Another advantage is that there is no negative effect on evaluation because our method is just applied during the training process. Typical experiments show the proposed method improves the generalization of the detection networks. On PASCAL VOC and MS COCO, our method outperforms the state-of-the-art RFBNet of one-stage detectors for real-time processing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 11

MimicDet: Bridging the Gap Between One-Stage and Two-Stage Object Detection

Dynamic multi-scale loss optimization for object detection

Article 23 June 2022

Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training

References

Kán, P., Kafumann, H.: Deeplight: light source estimation for augmented reality using deep learning. Vis. Comput. 35(6–8), 873–883 (2019)
Article Google Scholar
Luciano, L., Hamza, A.B.: Deep similarity network fusion for 3d shape classification. Vis. Comput. 35(6–8), 1171–1180 (2019)
Article Google Scholar
Li, H., He, F., Liang, Y., Quan, Q.: A dividing-based many-objectives evolutionary algorithm for large-scale feature selection. Soft Comput. (2019). https://doi.org/10.1007/s00500-019-04324-5
Article Google Scholar
Zhang, S., He, F., Ren, W., Yao, W.: Joint learning of image detail and transmission map for single image dehazing. Vis. Comput. (2018). https://doi.org/10.1007/s00371-018-1612-9
Article Google Scholar
Pan, Y., He, F., Yu, H.: A correlative denoising autoencoder to model social influence for top-n recommender system. Front. Comput. Sci. (2019). https://doi.org/10.1007/s11704-019-8123-3
Article Google Scholar
Chen, X., He, F., Yu, H.: A matting method based on full feature coverage. Multimed. Tools Appl. 78(9), 11173–11201 (2019)
Article Google Scholar
Yu, H., He, F., Pan, Y.: A novel segmentation model for medical images with intensity inhomogeneity based on adaptive perturbation. Multimed. Tools Appl. 78(9), 11779–11798 (2019)
Article Google Scholar
Li, K., He, F., Yu, H., Chen, X.: A parallel and robust object tracking approach synthesizing adaptive bayesian learning and improved incremental subspace learning. Front. Comput. Sci. 13(5), 1116–1135 (2019)
Article Google Scholar
Yu, H., He, F., Pan, Y.: A novel region-based active contour model via local patch similarity measure for image segmentation. Multimed. Tools Appl. 77(18), 24097–24119 (2018)
Article Google Scholar
Li, K., Fa-Zhi, H.E., Yu, H-p, Chen, X.: A correlative classifiers approach based on particle filter and sample set for tracking occluded target. Appl. Math. J. Chin. Univ. 32(2), 294–312 (2017)
Article MathSciNet Google Scholar
Li, K., He, F.Z., Yu, H.P.: Robust visual tracking based on convolutional features with illumination and occlusion handing. J. Comput. Sci. Technol. 33(1), 223–236 (2018)
Article Google Scholar
Sun, J., Fa-Zhi, H.E., Chen, Y.L., Xiao, C.: A multiple template approach for robust tracking of fast motion target. Appl. Math. J. Chin. Univ. 31(2), 177–197 (2016)
Article MathSciNet Google Scholar
Yu, H., He, F., Pan, Y.: A scalable region-based level set method using adaptive bilateral filter for noisy image segmentation. Multimed. Tools Appl. (2019). https://doi.org/10.1007/s11042-019-08493-1
Article Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Article Google Scholar
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Zhang, J., Wang, C., Li, C., Qin, H.: Example-based rapid generation of vegetation on terrain via CNN-based distribution learning. Vis. Comput. 35(6–8), 1181–1191 (2019)
Article Google Scholar
Yu, Z., Liu, Q., Liu, G.: Deeper cascaded peak-piloted network for weak expression recognition. Vis. Comput. 34(12), 1691–1699 (2018)
Article Google Scholar
Li, Y., Wang, Z., Yang, X., Wang, M., Poiana, S.I., Chaudhry, E., Zhang, J.: Efficient convolutional hierarchical autoencoder for human motion prediction. Vis. Comput. 35(6–8), 1143–1156 (2019)
Article Google Scholar
Arashloo, S.R., Kittler, J.: Dynamic texture recognition using multiscale binarized statistical image features. IEEE Trans. Multimed. 16(8), 2099–2109 (2014)
Article Google Scholar
Zhang, S., Han, Z., Lai, Y.-K., Zwicker, M., Zhang, H.: Stylistic scene enhancement GAN: mixed stylistic enhancement generation for 3d indoor scenes. Vis. Comput. 35(6–8), 1157–1169 (2019)
Article Google Scholar
Vapnik, V.: Statistical Learning Theory, vol. 3. Wiley, New York (1998)
MATH Google Scholar
Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization, arXiv preprint arXiv:1710.09412 (2017)
Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization, arXiv preprint arXiv:1611.03530 (2016)
Simonyan K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Simard, P.Y., LeCun, Y.A., Denker, J.S., Victorri, B.: Transformation invariance in pattern recognition–tangent distance and tangent propagation. In: Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade, pp. 239–274. Springer, Berlin (1998)
Chapter Google Scholar
Chapelle, O., Weston, J., Bottou, L., Vapnik, V.: Vicinal risk minimization. In: Advances in Neural Information Processing Systems, pp. 416–422 (2001)
Xiao, T., Xia, T., Yang, Y., Huang, C., Wang, X.: Learning from massive noisy labeled data for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2691–2699 (2015)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Guo, H., Mao, Y., Zhang, R.: Mixup as locally linear out-of-manifold regularization, arXiv preprint arXiv:1809.02499 (2018)
Takahashi, R., Matsubara, T., Uehara, K.: Ricap: random image cropping and patching data augmentation for deep CNNs. In: Proceedings of The 10th Asian Conference on Machine Learning, ser. Proceedings of Machine Learning Research, J. Zhu and I. Takeuchi, Eds., vol. 95. PMLR, 14–16 Nov 2018, pp. 786–798
Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)
Article Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. (2018)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger, arXiv preprint (2017)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer (2016)
Liu, S., Huang, D., Wang, A.: Receptive field block net for accurate and fast object detection. In: The European Conference on Computer Vision (ECCV) (2018)
Teng, C.-M.: A comparison of noise handling techniques. In: FLAIRS Conference, pp. 269–273 (2001)
Zhu, X., Wu, X.: Class noise vs. attribute noise: a quantitative study. Artif. Intell. Rev. 22(3), 177–210 (2004)
Article Google Scholar
Bartlett, P.L., Jordan, M.I., McAuliffe, J.D.: Convexity, classification, and risk bounds. J. Am. Stat. Assoc. 101(473), 138–156 (2006)
Article MathSciNet Google Scholar
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer (2014)
Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts,” arXiv preprint arXiv:1608.03983 (2016)
Fu, C.-Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD: deconvolutional single shot detector, arXiv preprint arXiv:1701.06659 (2017)
Lin, T.-Y., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: CVPR, vol. 1, no. 2, p. 4 (2017)
Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems (NIPS 2016), pp. 379–387. Neural Information Processing Systems Foundation, Inc (2016)
Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks, CoRR, abs/1703.06211, vol. 1, no. 2, p. 3 (2017)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: The IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988. Springer (2017)
Oksuz, K., Can Cam, B., Akbas, E., Kalkan, S.: Localization recall precision (lrp): a new performance metric for object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 504–519 (2018)
Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation,” arXiv preprint arXiv:1708.04896 (2017)
Yan, X., He, F., Hou, N., Ai, H.: An efficient particle swarm optimization for large-scale hardware/software co-design system. Int. J. Coop. Inf. Syst. 27(01), 1741001 (2018)
Article Google Scholar
Liu, X., Xu, Q., Wang, N.: A survey on deep neural network-based image captioning. Vis. Comput. 35(3), 445–470 (2019)
Article Google Scholar
Zhang, J., He, F., Chen, Y.: A new haze removal approach for sky/river alike scenes based on external and internal clues. Multimed. Tools Appl. (2019). https://doi.org/10.1007/s11042-019-08399-y
Article Google Scholar
Abbasi, A., Kalkan, S., Sahillioğlu, Y.: Deep 3d semantic scene extrapolation. Vis. Comput. 35(2), 271–279 (2019)
Article Google Scholar
Hou, N., He, F., Zhou, Y., Chen, Y.: An efficient GPU-based parallel tabu search algorithm for hardware/software co-design. Front. Comput. Sci. (2019). https://doi.org/10.1007/s11704-019-8184-3
Article Google Scholar
Li, H., He, F., Yan, X.: IBEA-SVM: an indicator-based evolutionary algorithm based on pre-selection with classification guided by SVM. Appl. Math. J. Chin. Univ. 34, 1–26 (2019)
Article MathSciNet Google Scholar
Zhang, S., He, F.: DRCDN:learning deep residual convolutional dehazing networks. Vis. Comput. (2019). https://doi.org/10.1007/s00371-019-01774-8
Article Google Scholar
Wu, Y., He, F., Zhang, D., Li, X.: Service-oriented feature-based data exchange for cloud-based design and manufacturing. IEEE Trans. Serv. Comput. 11(2), 341–353 (2018)
Article Google Scholar
Zhang, Z., Han, C., He, S., Liu, X., Zhu, H., Hu, X., Wong, T.-T.: Deep binocular tone mapping. Vis. Comput. 35(6–8), 997–1011 (2019)
Article Google Scholar
Luo, J., He, F., Yong, J.: An efficient and robust bat algorithm with fusion of opposition-based learning and whale optimization algorithm. Intell. Data Anal. 24(3), 500–519 (2020)
Article Google Scholar
Yong, J-s, He, F-z, Li, H-r, Zhou, W-q: A novel bat algorithm based on cross boundary learning and uniform explosion strategy. Appl. Math. J. Chin. Univ. 34(4), 480–502 (2019)
Article MathSciNet Google Scholar
Zhou, Y., He, F., Qiu, Y.: Dynamic strategy based parallel ant colony optimization on GPUs for TSPs. Sci. China Inf. Sci. 60(6), 068102 (2017)
Article Google Scholar
Rasool, S., Sourin, A.: Real-time haptic interaction with RGBD video streams. Vis. Comput. 32(10), 1311–1321 (2016)
Article Google Scholar
Dal Corso, A., Frisvad, J.R., Mosegaard, J., Baerentzen, J.A.: Interactive directional subsurface scattering and transport of emergent light. Vis. Comput. 33(3), 371–383 (2017)
Article Google Scholar
Eren, M.T., Balcisoy, S.: Evaluation of x-ray visualization techniques for vertical depth judgments in underground exploration. Vis. Comput. 34(3), 405–416 (2018)
Article Google Scholar

Download references

Funding

This study was funded by NSFC (Grant No. 61472289).

Author information

Authors and Affiliations

School of Computer Science and Technology, Wuhan University, Wuhan, Hubei, China
Quan Quan, Fazhi He & Haoran Li

Authors

Quan Quan
View author publications
You can also search for this author in PubMed Google Scholar
Fazhi He
View author publications
You can also search for this author in PubMed Google Scholar
Haoran Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fazhi He.

Ethics declarations

Conflict of interest

All the authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Quan, Q., He, F. & Li, H. A multi-phase blending method with incremental intensity for training detection networks. Vis Comput 37, 245–259 (2021). https://doi.org/10.1007/s00371-020-01796-7

Download citation

Published: 27 January 2020
Issue Date: February 2021
DOI: https://doi.org/10.1007/s00371-020-01796-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A multi-phase blending method with incremental intensity for training detection networks

Abstract

Access this article

Similar content being viewed by others

MimicDet: Bridging the Gap Between One-Stage and Two-Stage Object Detection

Dynamic multi-scale loss optimization for object detection

Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A multi-phase blending method with incremental intensity for training detection networks

Abstract

Access this article

Similar content being viewed by others

MimicDet: Bridging the Gap Between One-Stage and Two-Stage Object Detection

Dynamic multi-scale loss optimization for object detection

Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation