Abstract
Video object segmentation is an important field in computer vision. However, the challenges in video object segmentation such as background clutter, occlusion and edge ambiguity cannot be avoided. In addition, existing labeled video object segmentation datasets are limited in size, which prevents CNN models from reaching their full generalization capabilities. In this paper, we propose a novel approach, called random grid-hiding (RGH), to perform data augmentation. We divide the training image into several rectangular regions and hide some regions randomly during model training. Thus, the convolutional neural network automatically focuses on the discriminative parts of the image. When the most discriminative part of the image is hidden, it compels the network focus on the other related parts of the image. Further, occlusion images are randomly generated in various levels. More features can be obtained by random grid-hiding, which can effectively reduce the risk of overfitting. Our approach is an effective extension of the data augmentation (such as random cropping and random flipping), and leads to improved accuracy in the task of the video object segmentation method on DAVIS dataset. Our experimental results show that the proposed method is a stable and effective method for data augmentation.












Similar content being viewed by others
References
Bilen H, Pedersoli M, Tuytelaars T (2014) Weakly supervised object detection with posterior regularization. In: The British machine vision conference, pp 1997–2005
Caelles S, Chen Y, Pont-Tuset J, Gool LV (2017) Semantically-guided video object segmentation. arXiv:1704.01926
Caelles S, Maninis KK, Ponttuset J, Lealtaixe L, Cremers D, Gool LV (2017) One-shot video object segmentation. pp 5320–5329
Chang J, Wei D, Fisher JW (2013) A video representation using temporal superpixels. In: IEEE Conference on computer vision & pattern recognition
Charalambous CC, Bharath AA (2016) A data augmentation methodology for training machine/deep learning gait recognition algorithms. In: 2016 British machine vision conference
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected crfs. Comput Sci 4:357–361
Cheng J, Tsai YH, Wang S, Yang MH (2017) Segflow: joint learning for video object segmentation and optical flow, pp 686–695
Deng J, Dong W, Socher R, Li LJ, Li K, Li F (2009) Imagenet: a large-scale hierarchical image database. In: Proc of IEEE computer vision & pattern recognition, pp 248–255
Devries T, Taylor GW (2017) Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552
Dreossi T, Ghosh S, Yue X, Keutzer K, Seshia SA (2018) Counterexample-guided data augmentation
Faktor A, Irani M (2014) Video segmentation by non-local consensus voting. In: BMVC 2014 - Proceedings of the British machine vision conference, p 2014. https://doi.org/10.5244/C.28.21
Fergus R, Perona P, Zisserman A (2003) Object class recognition by unsupervised scale-invariant learning. In: 2003 IEEE Computer society conference on computer vision and pattern recognition, 2003. Proceedings, vol 2, pp II–264–II–271
Gidaris S, Komodakis N (2015) Object detection via a multi-region and semantic segmentation-aware cnn model. In: IEEE International conference on computer vision, pp 1134–1142
Girshick R (2015) Fast r-cnn. In: IEEE International conference on computer vision
Grundmann M, Kwatra V, Mei H, Essa I (2010) Efficient hierarchical graph-based video segmentation. In: Computer vision & pattern recognition
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on computer vision and pattern recognition, pp 770–778
He K, Gkioxari G, Dollar P, Girshick R (2017) Mask r-cnn. IEEE Trans Pattern Anal Mach Intell PP(99):1–1
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift, pp 448–456
Jain SD, Grauman K (2014) Supervoxel-consistent foreground propagation in video. In: European conference on computer vision, pp 656–671
Jampani V, Gadde R, Gehler PV (2017) Video propagation networks. In: IEEE Conference on computer vision & pattern recognition
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: International conference on neural information processing systems
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86:2278–2324. https://doi.org/10.1109/5.726791
Li F, Kim T, Humayun A, Tsai D, Rehg JM (2014) Video segmentation by tracking many figure-ground segments. In: IEEE International conference on computer vision
Long J, Shelhamer E, Darrell T (2014) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651
Märki N, Perazzi F, Wang O, Sorkine-Hornung A (2016) Bilateral space video segmentation. In: Computer vision & pattern recognition
Mikolajczyk A, Grochowski M (2018) Data augmentation for improving deep learning in image classification problem. In: 2018 International interdisciplinary PhD workshop (IIPhDW), pp 117–122
Oquab M, Bottou L, Laptev I, Sivic J (2015) Is object localization for free? - weakly-supervised learning with convolutional neural networks. In: Computer vision and pattern recognition, pp 685–694
Papazoglou A, Ferrari V (2014) Fast object segmentation in unconstrained video. In: IEEE International conference on computer vision
Perazzi F, Khoreva A, Benenson R, Schiele B, Sorkinehornung A (2016) Learning video object segmentation from static images, pp 3491–3500
Perazzi F, Wang O, Gross M, Sorkine-Hornung A (2016) Fully connected object proposals for video segmentation. In: IEEE International conference on computer vision, pp 3227–3234
Perez L, Wang J (2017) The effectiveness of data augmentation in image classification using deep learning. arXiv e-prints
Ramakanth SA, Babu RV (2014) Seamseg: video object segmentation using patch seams. In: IEEE conference on computer vision & pattern recognition
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Siva P, Russell C, Xiang T (2012) In defence of negative mining for annotating weakly labelled data. In: European conference on computer vision, pp 594–608
Smirnov EA, Timoshenko DM, Andrianov SN (2014) Comparison of regularization methods for imagenet classification with deep convolutional neural networks. Aasri Procedia 6(1):89–94
Takahashi R, Matsubara T, Uehara K (2018) Data augmentation using random image cropping and patching for deep cnns. arXiv:1811.09030
Tran T, Pham T, Carneiro G, Palmer L, Reid I (2017) A Bayesian data augmentation approach for learning deep models. In: Advances in neural information processing systems, pp 2794–2803
Tsai YH, Yang MH, Black MJ (2016) Video segmentation via object flow. In: IEEE Conference on computer vision & pattern recognition
Wu R, Yan S, Shan Y, Dang Q, Sun G (2015) Deep image: scaling up image recognition. arXiv:1501.02876
Yan Z, Zhang H, Piramuthu R, Jagadeesh V, Decoste D, Di W, Yu Y (2016) Hd-cnn: hierarchical deep convolutional neural networks for large scale visual recognition. In: IEEE International conference on computer vision, pp 2740–2748
Yong JL, Kim J, Grauman K (2011) Key-segments for video object segmentation. In: International conference on computer vision
Zheng Z, Zheng L, Yang Y (2017) Unlabeled samples generated by gan improve the person re-identification baseline in vitro, pp 3774–3782
Zhong Z, Zheng L, Kang G, Li S, Yang Y (2017) Random erasing data augmentation. arXiv:1708.04896
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2015) Learning deep features for discriminative localization, 2921–2929
Acknowledgements
This work was supported by the National Natural Science Foundation of China (61772530), Natural Science Foundation of Jiangsu Province of China (BK20171192), and the Fundamental Research Funds for the Central Universities (No. 2017XKQY075).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, S., Yao, R., Zhang, Y. et al. Data augmentation of random grid-hiding for video object segmentation. Multimed Tools Appl 78, 23029–23048 (2019). https://doi.org/10.1007/s11042-019-7569-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-7569-5