Skip to main content

Advertisement

Log in

Data augmentation of random grid-hiding for video object segmentation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Video object segmentation is an important field in computer vision. However, the challenges in video object segmentation such as background clutter, occlusion and edge ambiguity cannot be avoided. In addition, existing labeled video object segmentation datasets are limited in size, which prevents CNN models from reaching their full generalization capabilities. In this paper, we propose a novel approach, called random grid-hiding (RGH), to perform data augmentation. We divide the training image into several rectangular regions and hide some regions randomly during model training. Thus, the convolutional neural network automatically focuses on the discriminative parts of the image. When the most discriminative part of the image is hidden, it compels the network focus on the other related parts of the image. Further, occlusion images are randomly generated in various levels. More features can be obtained by random grid-hiding, which can effectively reduce the risk of overfitting. Our approach is an effective extension of the data augmentation (such as random cropping and random flipping), and leads to improved accuracy in the task of the video object segmentation method on DAVIS dataset. Our experimental results show that the proposed method is a stable and effective method for data augmentation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Bilen H, Pedersoli M, Tuytelaars T (2014) Weakly supervised object detection with posterior regularization. In: The British machine vision conference, pp 1997–2005

  2. Caelles S, Chen Y, Pont-Tuset J, Gool LV (2017) Semantically-guided video object segmentation. arXiv:1704.01926

  3. Caelles S, Maninis KK, Ponttuset J, Lealtaixe L, Cremers D, Gool LV (2017) One-shot video object segmentation. pp 5320–5329

  4. Chang J, Wei D, Fisher JW (2013) A video representation using temporal superpixels. In: IEEE Conference on computer vision & pattern recognition

  5. Charalambous CC, Bharath AA (2016) A data augmentation methodology for training machine/deep learning gait recognition algorithms. In: 2016 British machine vision conference

  6. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected crfs. Comput Sci 4:357–361

    Google Scholar 

  7. Cheng J, Tsai YH, Wang S, Yang MH (2017) Segflow: joint learning for video object segmentation and optical flow, pp 686–695

  8. Deng J, Dong W, Socher R, Li LJ, Li K, Li F (2009) Imagenet: a large-scale hierarchical image database. In: Proc of IEEE computer vision & pattern recognition, pp 248–255

  9. Devries T, Taylor GW (2017) Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552

  10. Dreossi T, Ghosh S, Yue X, Keutzer K, Seshia SA (2018) Counterexample-guided data augmentation

  11. Faktor A, Irani M (2014) Video segmentation by non-local consensus voting. In: BMVC 2014 - Proceedings of the British machine vision conference, p 2014. https://doi.org/10.5244/C.28.21

  12. Fergus R, Perona P, Zisserman A (2003) Object class recognition by unsupervised scale-invariant learning. In: 2003 IEEE Computer society conference on computer vision and pattern recognition, 2003. Proceedings, vol 2, pp II–264–II–271

  13. Gidaris S, Komodakis N (2015) Object detection via a multi-region and semantic segmentation-aware cnn model. In: IEEE International conference on computer vision, pp 1134–1142

  14. Girshick R (2015) Fast r-cnn. In: IEEE International conference on computer vision

  15. Grundmann M, Kwatra V, Mei H, Essa I (2010) Efficient hierarchical graph-based video segmentation. In: Computer vision & pattern recognition

  16. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on computer vision and pattern recognition, pp 770–778

  17. He K, Gkioxari G, Dollar P, Girshick R (2017) Mask r-cnn. IEEE Trans Pattern Anal Mach Intell PP(99):1–1

    Google Scholar 

  18. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift, pp 448–456

  19. Jain SD, Grauman K (2014) Supervoxel-consistent foreground propagation in video. In: European conference on computer vision, pp 656–671

  20. Jampani V, Gadde R, Gehler PV (2017) Video propagation networks. In: IEEE Conference on computer vision & pattern recognition

  21. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: International conference on neural information processing systems

  22. Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86:2278–2324. https://doi.org/10.1109/5.726791

    Article  Google Scholar 

  23. Li F, Kim T, Humayun A, Tsai D, Rehg JM (2014) Video segmentation by tracking many figure-ground segments. In: IEEE International conference on computer vision

  24. Long J, Shelhamer E, Darrell T (2014) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651

    Google Scholar 

  25. Märki N, Perazzi F, Wang O, Sorkine-Hornung A (2016) Bilateral space video segmentation. In: Computer vision & pattern recognition

  26. Mikolajczyk A, Grochowski M (2018) Data augmentation for improving deep learning in image classification problem. In: 2018 International interdisciplinary PhD workshop (IIPhDW), pp 117–122

  27. Oquab M, Bottou L, Laptev I, Sivic J (2015) Is object localization for free? - weakly-supervised learning with convolutional neural networks. In: Computer vision and pattern recognition, pp 685–694

  28. Papazoglou A, Ferrari V (2014) Fast object segmentation in unconstrained video. In: IEEE International conference on computer vision

  29. Perazzi F, Khoreva A, Benenson R, Schiele B, Sorkinehornung A (2016) Learning video object segmentation from static images, pp 3491–3500

  30. Perazzi F, Wang O, Gross M, Sorkine-Hornung A (2016) Fully connected object proposals for video segmentation. In: IEEE International conference on computer vision, pp 3227–3234

  31. Perez L, Wang J (2017) The effectiveness of data augmentation in image classification using deep learning. arXiv e-prints

  32. Ramakanth SA, Babu RV (2014) Seamseg: video object segmentation using patch seams. In: IEEE conference on computer vision & pattern recognition

  33. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  34. Siva P, Russell C, Xiang T (2012) In defence of negative mining for annotating weakly labelled data. In: European conference on computer vision, pp 594–608

  35. Smirnov EA, Timoshenko DM, Andrianov SN (2014) Comparison of regularization methods for imagenet classification with deep convolutional neural networks. Aasri Procedia 6(1):89–94

    Article  Google Scholar 

  36. Takahashi R, Matsubara T, Uehara K (2018) Data augmentation using random image cropping and patching for deep cnns. arXiv:1811.09030

  37. Tran T, Pham T, Carneiro G, Palmer L, Reid I (2017) A Bayesian data augmentation approach for learning deep models. In: Advances in neural information processing systems, pp 2794–2803

  38. Tsai YH, Yang MH, Black MJ (2016) Video segmentation via object flow. In: IEEE Conference on computer vision & pattern recognition

  39. Wu R, Yan S, Shan Y, Dang Q, Sun G (2015) Deep image: scaling up image recognition. arXiv:1501.02876

  40. Yan Z, Zhang H, Piramuthu R, Jagadeesh V, Decoste D, Di W, Yu Y (2016) Hd-cnn: hierarchical deep convolutional neural networks for large scale visual recognition. In: IEEE International conference on computer vision, pp 2740–2748

  41. Yong JL, Kim J, Grauman K (2011) Key-segments for video object segmentation. In: International conference on computer vision

  42. Zheng Z, Zheng L, Yang Y (2017) Unlabeled samples generated by gan improve the person re-identification baseline in vitro, pp 3774–3782

  43. Zhong Z, Zheng L, Kang G, Li S, Yang Y (2017) Random erasing data augmentation. arXiv:1708.04896

  44. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2015) Learning deep features for discriminative localization, 2921–2929

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (61772530), Natural Science Foundation of Jiangsu Province of China (BK20171192), and the Fundamental Research Funds for the Central Universities (No. 2017XKQY075).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rui Yao.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, S., Yao, R., Zhang, Y. et al. Data augmentation of random grid-hiding for video object segmentation. Multimed Tools Appl 78, 23029–23048 (2019). https://doi.org/10.1007/s11042-019-7569-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-7569-5

Keywords