Abstract
3D display has become the inevitable trend of display technology. Converting the traditional and classical 2D videos to 3D videos is an important and effective measure to solve the shortage of 3D contents. The major work about 2D-to-3D video conversion is how to extract depth information from the 2D video, and synthesize a new image from the existing viewpoint. We propose a depth extraction method based on dense edge-preserving optical flow from 2D videos, reducing the matching error in textureless regions. Moreover, we use the Gaussian Pyramid and Laplace Pyramid at cross scales to fill the holes in the image at new view point after 3D warping. The experiments show that our results outperform state-of-the-art methods in visual effect and statistics.













Similar content being viewed by others
References
Arbelaez P, Maire M, Fowlkes C, Malik J (2011) Contour detection and hierarchical image segmentation. IEEE Trans Pattern Anal Mach Intell 33(5):898–916
Assa J, Wolf L (2010) Diorama construction from a single image. Comput Graphics Forum 26(3):599–608
Barnes C, Shechtman E, Finkelstein A, Goldman D (2009) Patchmatch: a randomized correspondence algorithm for structural image editing. ACM Trans Graph-TOG 28(3):24
Butler DJ, Wulff J, Stanley GB, Black MJ (2012) A naturalistic open source movie for optical flow evaluation. In: European conference on computer vision. Springer, pp 611–625
Cheng C-C, Li C-T, Chen L-G (2010) A novel 2dd-to-3d conversion system using edge information. IEEE Trans Consum Electron 56(3):1739–1745
Cozman F, Krotkov E (1997) Depth from scattering. In: 1997 IEEE computer society conference on computer vision and pattern recognition, 1997. Proceedings. IEEE, pp 801–806
Criminisi A, Perez P, Toyama K (2003) Object removal by exemplar-based inpainting. In: 2003 IEEE Computer society conference on computer vision and pattern recognition, 2003. Proceedings, vol 2. IEEE, pp II–721
Criminisi A, Pérez P, Toyama K (2004) Region filling and object removal by exemplar-based image inpainting. IEEE Trans Image Process 13(9):1200–1212
Cui J, Liu Y, Xu Y, Zhao H, Zha H (2013) Tracking generic human motion via fusion of low-and high-dimensional approaches. IEEE Trans Syst Man Cybern Syst Hum 43(4):996–1002
Do L, Zinger S et al (2010) Quality improving techniques for free-viewpoint dibr. In: IS&T/SPIE electronic imaging. International Society for Optics and Photonics, pp 75240I–75240I
Dollár P, Zitnick CL (2013) Structured forests for fast edge detection. In: Proceedings of the IEEE international conference on computer vision, pp 1841–1848
Du C, Chen YL, Ye M, Ren L (2016) Edge snapping-based depth enhancement for dynamic occlusion handling in augmented reality, pp 54–62
Farnebäck G (2003) Two-frame motion estimation based on polynomial expansion. In: Scandinavian conference on image analysis. Springer, pp 363–370
Guttmann M, Wolf L, Cohen-Or D (2009) . In: Semi-automatic stereo extraction from video footage, vol 30, pp 136–142
Hebborn AK, Hohner N, Muller S (2017) Occlusion matting: realistic occlusion handling for augmented reality applications. In: IEEE international symposium on mixed and augmented reality, pp 62–71
Horn BK, Brooks MJ (1986) The variational approach to shape from shading. Comput Vis Graph Image Process 33(2):174–208
Karsch K, Liu C, Kang SB (2014) Depth transfer: depth extraction from video using non-parametric sampling. IEEE Trans Pattern Anal Mach Intell 36(11):2144
Kim D-S, Lee S-S, Choi B-H (2010) A real-time stereo depth extraction hardware for intelligent home assistant robot. IEEE Trans Consum Electron 3:56
Konrad J, Wang M, Ishwar P (2012) 2d-to-3d image conversion by learning depth from examples. In: Computer vision and pattern recognition workshops, pp 16–22
Liu Y, Zhang X, Cui J, Wu C, Aghajan H, Zha H (2010) Visual analysis of child-adult interactive behaviors in video sequences. In: International conference on virtual systems and multimedia, pp 26–33
Liu Y, Nie L, Han L, Zhang L, Rosenblum DS (2015) Action2activity: recognizing complex activities from sensor data. In: International conference on artificial intelligence, pp 1617–1623
Liu L, Cheng L, Liu Y, Jia Y, Rosenblum DS (2016) Recognizing complex activities by a probabilistic interval-based model. In: Thirtieth AAAI conference on artificial intelligence, pp 1266–1272
Liu Y, Nie L, Liu L, Rosenblum DS (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115
Loh AM, Hartley RI et al (2005) Shape from non-homogeneous, non-stationary, anisotropic, perspective texture. In: BMVC, pp 69–78
Lucas BD, Kanade T et al (1981) An iterative image registration technique with an application to stereo vision. IJCAI 81(1):674–679
Mark WR, McMillan L, Bishop G (1997) Post-rendering 3d warping. In: Proceedings of the 1997 symposium on Interactive 3D graphics. ACM, pp 7–ff
Moré JJ (1978) The levenberg-marquardt algorithm: implementation and theory. In: Numerical analysis. Springer, pp 105–116
Mpeg-ftv test sequence download page. http://www.tanimoto.nuee.nagoya-u.ac.jp/fukushima/mpegftv/
Oh K-J, Yea S, Ho Y-S (2009) Hole filling method using depth based in-painting for view synthesis in free viewpoint television and 3-d video. In: Picture coding symposium, 2009. PCS 2009. IEEE, pp 1–4
Pérez JS, Meinhardt-Llopis E, Facciolo G (2013) Tv-l1 optical flow estimation. Image Process On Line 2013:137–150
Pourazad MT, Nasiopoulos P, Ward RK (2009) An h. 264-based scheme for 2d to 3d video conversion. IEEE Trans Consum Electron 55(2):742–748
Revaud J, Weinzaepfel P, Harchaoui Z, Schmid C (2015) Epicflow: edge-preserving interpolation of correspondences for optical flow. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1164–1172
Saxena A, Sun M, Ng AY (2009) Make3d: Learning 3d scene structure from a single still image. IEEE Trans Pattern Anal Mach Intell 31(5):824–840
Scharstein D, Szeliski R (2002) A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int J Comput Vis 47(1–3):7–42
Schmeing M, Jiang X (2011) Time-consistency of disocclusion filling algorithms in depth image based rendering. In: 3dtv conference: the true vision—capture, transmission and display of 3d video, pp 1–4
Shade J, Gortler S, He L-W, Szeliski R (1998) Layered depth images. In: Proceedings of the 25th annual conference on Computer graphics and interactive techniques. ACM, pp 231–242
Shao M, Simchony T, Chellappa R (1988) New algorithms from reconstruction of a 3-d depth map from one or more images. In: Computer society conference on computer vision and pattern recognition, 1988. Proceedings CVPR ’88, pp 530–535
Taketomi Y, Ikeoka H, Hamamoto T (2013) Depth estimation based on defocus blur using a single image taken by a tilted lens optics camera, pp 403–408
Tam WJ, Zhang L (2006) 3d-tv content generation: 2d-to-3d conversion. In: 2006 IEEE international conference on multimedia and expo. IEEE, pp 1869–1872
Tao M, Bai J, Kohli P, Paris S (2012) Simpleflow: a non-iterative, sublinear optical flow algorithm. In: Computer graphics forum, vol 31, no 2pt1. Wiley Online Library, pp 345–353
Telea A (2004) An image inpainting technique based on the fast marching method. J Graph Tools 9(1):23–34
Ward B, Kang SB, Bennett EP (2011) Depth director: a system for adding depth to movies. IEEE Comput Graph Appl 31(1):36–48
Wasserman L (2013) All of statistics: a concise course in statistical inference. Springer Science & Business Media
Wei Q (2005) Converting 2d to 3d: a survey. In: International conference, Page (s), vol 7, p 14
Weinzaepfel P, Revaud J, Harchaoui Z, Schmid C (2013) Deepflow: large displacement optical flow with deep matching. In: Proceedings of the IEEE international conference on computer vision, pp 1385–1392
Wong EF, Wong KT (2004) Single image depth from defocus. Master’s thesis, Delft university of Technology, The Netherlands
Xie J, Girshick R, Farhadi A (2016) Deep3d: fully automatic 2d-to-3d video conversion with deep convolutional neural networks. In: European conference on computer vision. Springer, pp 842–857
Yao L, Han Y, Li X (2016) Virtual viewpoint synthesis using cuda acceleration. In: ACM conference on virtual reality software and technology, pp 367–368
Zach C, Pock T, Bischof H (2007) A duality based approach for realtime tv-l1 optical flow. In: Joint pattern recognition symposium. Springer, pp 214–223
Zhang L, Tam WJ (2005) Stereoscopic image generation based on depth images for 3d tv. IEEE Trans Broadcast 51(2):191–199
Zhang R, Tsai P-S, Cryer JE, Shah M (1999) Shape-from-shading: a survey. IEEE Trans Pattern Anal Mach Intell 21(8):690–706
Zitnick CL, Kang SB, Uyttendaele M, Winder S, Szeliski R (2004) High-quality video view interpolation using a layered representation. In: ACM Transactions on Graphics (TOG), vol 23, no. 3. ACM, pp 600–608
Acknowledgements
This work is supported by Industrial Prospective Project of Jiangsu Technology Department under Grant No. BE2017081.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yao, L., Liu, Z. & Wang, B. 2D-to-3D conversion using optical flow based depth generation and cross-scale hole filling algorithm. Multimed Tools Appl 78, 10543–10564 (2019). https://doi.org/10.1007/s11042-018-6583-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6583-3