Data augmentation of random grid-hiding for video object segmentation

Wang, Shi; Yao, Rui; Zhang, Yikun; Jiang, Qingnan; Zhang, Changbin

doi:10.1007/s11042-019-7569-5

Data augmentation of random grid-hiding for video object segmentation

Published: 27 April 2019

Volume 78, pages 23029–23048, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Shi Wang¹,
Rui Yao ORCID: orcid.org/0000-0003-2734-915X¹,
Yikun Zhang¹,
Qingnan Jiang¹ &
…
Changbin Zhang¹

385 Accesses
Explore all metrics

Abstract

Video object segmentation is an important field in computer vision. However, the challenges in video object segmentation such as background clutter, occlusion and edge ambiguity cannot be avoided. In addition, existing labeled video object segmentation datasets are limited in size, which prevents CNN models from reaching their full generalization capabilities. In this paper, we propose a novel approach, called random grid-hiding (RGH), to perform data augmentation. We divide the training image into several rectangular regions and hide some regions randomly during model training. Thus, the convolutional neural network automatically focuses on the discriminative parts of the image. When the most discriminative part of the image is hidden, it compels the network focus on the other related parts of the image. Further, occlusion images are randomly generated in various levels. More features can be obtained by random grid-hiding, which can effectively reduce the risk of overfitting. Our approach is an effective extension of the data augmentation (such as random cropping and random flipping), and leads to improved accuracy in the task of the video object segmentation method on DAVIS dataset. Our experimental results show that the proposed method is a stable and effective method for data augmentation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

Semi-supervised Domain Adaptation for Weakly Labeled Semantic Video Object Segmentation

Semi-supervised Video Object Segmentation Using Parallel Coattention Network

Dual Attention Based Network with Hierarchical ConvLSTM for Video Object Segmentation

References

Bilen H, Pedersoli M, Tuytelaars T (2014) Weakly supervised object detection with posterior regularization. In: The British machine vision conference, pp 1997–2005
Caelles S, Chen Y, Pont-Tuset J, Gool LV (2017) Semantically-guided video object segmentation. arXiv:1704.01926
Caelles S, Maninis KK, Ponttuset J, Lealtaixe L, Cremers D, Gool LV (2017) One-shot video object segmentation. pp 5320–5329
Chang J, Wei D, Fisher JW (2013) A video representation using temporal superpixels. In: IEEE Conference on computer vision & pattern recognition
Charalambous CC, Bharath AA (2016) A data augmentation methodology for training machine/deep learning gait recognition algorithms. In: 2016 British machine vision conference
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected crfs. Comput Sci 4:357–361
Google Scholar
Cheng J, Tsai YH, Wang S, Yang MH (2017) Segflow: joint learning for video object segmentation and optical flow, pp 686–695
Deng J, Dong W, Socher R, Li LJ, Li K, Li F (2009) Imagenet: a large-scale hierarchical image database. In: Proc of IEEE computer vision & pattern recognition, pp 248–255
Devries T, Taylor GW (2017) Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552
Dreossi T, Ghosh S, Yue X, Keutzer K, Seshia SA (2018) Counterexample-guided data augmentation
Faktor A, Irani M (2014) Video segmentation by non-local consensus voting. In: BMVC 2014 - Proceedings of the British machine vision conference, p 2014. https://doi.org/10.5244/C.28.21
Fergus R, Perona P, Zisserman A (2003) Object class recognition by unsupervised scale-invariant learning. In: 2003 IEEE Computer society conference on computer vision and pattern recognition, 2003. Proceedings, vol 2, pp II–264–II–271
Gidaris S, Komodakis N (2015) Object detection via a multi-region and semantic segmentation-aware cnn model. In: IEEE International conference on computer vision, pp 1134–1142
Girshick R (2015) Fast r-cnn. In: IEEE International conference on computer vision
Grundmann M, Kwatra V, Mei H, Essa I (2010) Efficient hierarchical graph-based video segmentation. In: Computer vision & pattern recognition
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on computer vision and pattern recognition, pp 770–778
He K, Gkioxari G, Dollar P, Girshick R (2017) Mask r-cnn. IEEE Trans Pattern Anal Mach Intell PP(99):1–1
Google Scholar
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift, pp 448–456
Jain SD, Grauman K (2014) Supervoxel-consistent foreground propagation in video. In: European conference on computer vision, pp 656–671
Jampani V, Gadde R, Gehler PV (2017) Video propagation networks. In: IEEE Conference on computer vision & pattern recognition
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: International conference on neural information processing systems
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86:2278–2324. https://doi.org/10.1109/5.726791
Article Google Scholar
Li F, Kim T, Humayun A, Tsai D, Rehg JM (2014) Video segmentation by tracking many figure-ground segments. In: IEEE International conference on computer vision
Long J, Shelhamer E, Darrell T (2014) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651
Google Scholar
Märki N, Perazzi F, Wang O, Sorkine-Hornung A (2016) Bilateral space video segmentation. In: Computer vision & pattern recognition
Mikolajczyk A, Grochowski M (2018) Data augmentation for improving deep learning in image classification problem. In: 2018 International interdisciplinary PhD workshop (IIPhDW), pp 117–122
Oquab M, Bottou L, Laptev I, Sivic J (2015) Is object localization for free? - weakly-supervised learning with convolutional neural networks. In: Computer vision and pattern recognition, pp 685–694
Papazoglou A, Ferrari V (2014) Fast object segmentation in unconstrained video. In: IEEE International conference on computer vision
Perazzi F, Khoreva A, Benenson R, Schiele B, Sorkinehornung A (2016) Learning video object segmentation from static images, pp 3491–3500
Perazzi F, Wang O, Gross M, Sorkine-Hornung A (2016) Fully connected object proposals for video segmentation. In: IEEE International conference on computer vision, pp 3227–3234
Perez L, Wang J (2017) The effectiveness of data augmentation in image classification using deep learning. arXiv e-prints
Ramakanth SA, Babu RV (2014) Seamseg: video object segmentation using patch seams. In: IEEE conference on computer vision & pattern recognition
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Siva P, Russell C, Xiang T (2012) In defence of negative mining for annotating weakly labelled data. In: European conference on computer vision, pp 594–608
Smirnov EA, Timoshenko DM, Andrianov SN (2014) Comparison of regularization methods for imagenet classification with deep convolutional neural networks. Aasri Procedia 6(1):89–94
Article Google Scholar
Takahashi R, Matsubara T, Uehara K (2018) Data augmentation using random image cropping and patching for deep cnns. arXiv:1811.09030
Tran T, Pham T, Carneiro G, Palmer L, Reid I (2017) A Bayesian data augmentation approach for learning deep models. In: Advances in neural information processing systems, pp 2794–2803
Tsai YH, Yang MH, Black MJ (2016) Video segmentation via object flow. In: IEEE Conference on computer vision & pattern recognition
Wu R, Yan S, Shan Y, Dang Q, Sun G (2015) Deep image: scaling up image recognition. arXiv:1501.02876
Yan Z, Zhang H, Piramuthu R, Jagadeesh V, Decoste D, Di W, Yu Y (2016) Hd-cnn: hierarchical deep convolutional neural networks for large scale visual recognition. In: IEEE International conference on computer vision, pp 2740–2748
Yong JL, Kim J, Grauman K (2011) Key-segments for video object segmentation. In: International conference on computer vision
Zheng Z, Zheng L, Yang Y (2017) Unlabeled samples generated by gan improve the person re-identification baseline in vitro, pp 3774–3782
Zhong Z, Zheng L, Kang G, Li S, Yang Y (2017) Random erasing data augmentation. arXiv:1708.04896
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2015) Learning deep features for discriminative localization, 2921–2929

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (61772530), Natural Science Foundation of Jiangsu Province of China (BK20171192), and the Fundamental Research Funds for the Central Universities (No. 2017XKQY075).

Author information

Authors and Affiliations

School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China
Shi Wang, Rui Yao, Yikun Zhang, Qingnan Jiang & Changbin Zhang

Authors

Shi Wang
View author publications
You can also search for this author inPubMed Google Scholar
Rui Yao
View author publications
You can also search for this author inPubMed Google Scholar
Yikun Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Qingnan Jiang
View author publications
You can also search for this author inPubMed Google Scholar
Changbin Zhang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Rui Yao.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, S., Yao, R., Zhang, Y. et al. Data augmentation of random grid-hiding for video object segmentation. Multimed Tools Appl 78, 23029–23048 (2019). https://doi.org/10.1007/s11042-019-7569-5

Download citation

Received: 09 August 2018
Revised: 12 February 2019
Accepted: 02 April 2019
Published: 27 April 2019
Issue Date: 30 August 2019
DOI: https://doi.org/10.1007/s11042-019-7569-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data augmentation of random grid-hiding for video object segmentation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semi-supervised Domain Adaptation for Weakly Labeled Semantic Video Object Segmentation

Semi-supervised Video Object Segmentation Using Parallel Coattention Network

Dual Attention Based Network with Hierarchical ConvLSTM for Video Object Segmentation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now