3-D Video Generation from Monocular Video Based on Hierarchical Video Segmentation

(Chris) Lee, Gwo Giun; Chen, Chun-Fu; Lin, He-Yuan; Wang, Ming-Jiun

doi:10.1007/s11265-014-0955-3

3-D Video Generation from Monocular Video Based on Hierarchical Video Segmentation

Published: 05 October 2014

Volume 81, pages 345–358, (2015)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Gwo Giun (Chris) Lee¹,
Chun-Fu Chen¹,
He-Yuan Lin¹ &
…
Ming-Jiun Wang¹

331 Accesses
3 Citations
Explore all metrics

Abstract

This paper presents an algorithm that is capable of generating 3-D video from monocular video through a hierarchical approach that characterizes the low-level features and high-level semantics of video content to generate depth map. Color and texture features are used to locally indicate video’s characteristics followed by segmentation of the video into several regions. Subsequently, semantics, including shape and motion semantics, of segmented regions are delineated from a higher perspective to refine segmentation result via measuring the interrelations among segmented regions. Subsequently, according to refined segmentation map and semantics of segmented regions, the proposed method generates a depth map with high stability using both spatial and temporal information. The stable depth map minimized visual quality degradation, such as flicker and blurring when viewing the 3-D video. The experimental results show the capability of the proposed algorithm in generating a high quality and stable depth map. In addition, subjective viewing evaluation shows that the proposed algorithm surpasses those of the commercial products for converting 2-D video to 3-D video, including TriDef 3D and CyberLink Power DVD.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep learning for video object segmentation: a review

Article Open access 08 April 2022

Multi3D: 3D-aware multimodal image synthesis

Article Open access 03 April 2024

See in 3D: state of the art of 3D display technologies

Article 15 October 2015

References

Tanimoto, M., Tehrani, M. P., Fujii, T., & Yendo, T. (2011). Free-viewpoint TV. Signal Processing Magazine, IEEE, 28(1), 67–76. doi:10.1109/MSP.2010.939077.
Article Google Scholar
ISO/IEC JTC1/SC29/WG11. (2011). Applications and Requirements on 3D Video Coding, Doc. N12035, Geneva, Switzerland, March.
Hartlety, R.-I., & Zisserman, A. (2004). Multiple view geometry in computer vision (Second editionth ed.). Cambridge: Cambridge University Press.
Book Google Scholar
Fehn, C. (2004). Depth-image-based rendering (DIBR), compression and transmission for a new approach on 3D-TV. In Stereoscopic Displays and Virtual Reality Systems XI, January 19, 2004 - January 21, San Jose, CA, United states, 2004 (Vol. 5291, pp. 93–104, Proceedings of SPIE - The International Society for Optical Engineering): SPIE. doi:10.1117/12.524762.
Li, S., Wang, F., & Liu, W. (2010). The overview of 2D to 3D conversion system. In Computer-Aided Industrial Design & Conceptual Design (CAIDCD), 2010 I.E. 11th International Conference on, 17–19 Nov. 2010 (Vol. 2, pp. 1388–1392). doi:10.1109/CAIDCD.2010.5681951.
Zhebin, Z., Yizhou, W., Tingting, J., & Wen, G. (2011). Visual pertinent 2D-to-3D video conversion by multi-cue fusion. In Image Processing (ICIP), 2011 18th IEEE International Conference on, 11–14 Sept. 2011 (pp. 909–912). doi:10.1109/ICIP.2011.6116707.
Guofeng, Z., Wei, H., Xueying, Q., Tien-Tsin, W., & Bao, H. (2007). Stereoscopic video synthesis from a monocular video. Visualization and Computer Graphics, IEEE Transactions on, 13(4), 686–696. doi:10.1109/TVCG.2007.1032.
Article Google Scholar
Ideses, I., Yaroslavsky, L., & Fishbain, B. (2007). Real-time 2D to 3D video conversion. Journal of Real-Time Image Processing, 2(1), 3–9. doi:10.1007/s11554-007-0038-9.
Article Google Scholar
Xiaojun, H., Lianghao, W., Junjun, H., Dongxiao, L., & Ming, Z. A. (2009). Depth Extraction Method Based on Motion and Geometry for 2D to 3D Conversion. In Intelligent Information Technology Application, 2009. IITA 2009. Third International Symposium on, 21–22 Nov. 2009 (Vol. 3, pp. 294–298). doi:10.1109/IITA.2009.481
Schnyder, L., Wang, O., & Smolic, A. (2011). 2D to 3D conversion of sports content using panoramas. In Image Processing (ICIP), 2011 18th IEEE International Conference on, 11–14 Sept. 2011 (pp. 1961–1964). doi:10.1109/ICIP.2011.6115857.
Donghyun, K., Dongbo, M., & Kwanghoon, S. (2008). A stereoscopic video generation method using stereoscopic display characterization and motion analysis. Broadcasting, IEEE Transactions on, 54(2), 188–197. doi:10.1109/TBC.2007.914714.
Article Google Scholar
Saxena, A., Min, S., & Ng, A. Y. (2007). Learning 3-D Scene Structure from a Single Still Image. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, 14–21 Oct. 2007 (pp. 1–8). doi:10.1109/ICCV.2007.4408828.
Tam, W. J., & Liang, Z. (2006). 3D-TV Content Generation: 2D-to-3D Conversion. In Multimedia and Expo, 2006 I.E. International Conference on, 9–12 July 2006 (pp. 1869–1872). doi:10.1109/ICME.2006.262919.
Rajagopalan, A. N., Chaudhuri, S., & Uma, M. (2004). Depth estimation and image restoration using defocused stereo pairs. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 26(11), 1521–1525. doi:10.1109/TPAMI.2004.102.
Article Google Scholar
Jahne, B., & Geissler, P. (1994). Depth from focus with one image. In Proceedings of the 1994 I.E. Computer Society Conference on Computer Vision and Pattern Recognition, June 21, 1994 - June 23, 1994, Seattle, WA, USA, (pp. 713–717, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition): Publ by IEEE. doi:10.1109/CVPR.1993.341015.
Xun, C., Zheng, L., & Qionghai, D. (2011). Semi-automatic 2D-to-3D conversion using disparity propagation. Broadcasting, IEEE Transactions on, 57(2), 491–499. doi:10.1109/TBC.2011.2127650.
Article Google Scholar
Xun, C., Bovik, A. C., Yao, W., & Qionghai, D. (2011). Converting 2D video to 3D: an efficient path to a 3D experience. MultiMedia, IEEE, 18(4), 12–17. doi:10.1109/MMUL.2011.65.
Article Google Scholar
Yue, F., Jinchang, R., & Jianmin, J. (2011). Object-based 2D-to-3D video conversion for effective stereoscopic content generation in 3D-TV applications. Broadcasting, IEEE Transactions on, 57(2), 500–509. doi:10.1109/TBC.2011.2131030.
Article Google Scholar
Chang, Y.-L., Fang, C.-Y., Ding, L.-F., Chen, S.-Y., & Chen, L.-G. (2007). Depth map generation for 2D-TO-3D conversion by short-term motion assisted color segmentation. In IEEE International Conference on Multimedia and Expo, ICME 2007, July 2, 2007 - July 5, 2007, Beijing, China, (pp. 1958–1961, Proceedings of the 2007 I.E. International Conference on Multimedia and Expo, ICME 2007): Inst. of Elec. and Elec. Eng. Computer Society.
Guo, G., Zhang, N., Huo, L., & Gao, W. (2008). 2D TO 3D convertion based on edge defocus and segmentation. In 2008 I.E. International Conference on Acoustics, Speech and Signal Processing, ICASSP, March 31, 2008 - April 4, 2008, Las Vegas, NV, United states, (pp. 2181–2184, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings): Institute of Electrical and Electronics Engineers Inc. doi:10.1109/ICASSP.2008.4518076.
Lloyd, S. (1982). Least squares quantization in PCM. Information Theory, IEEE Transactions on, 28(2), 129–137. doi:10.1109/TIT.1982.1056489.
Article MATH MathSciNet Google Scholar
Comaniciu, D., & Meer, P. (2002). Mean shift: a robust approach toward feature space analysis. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24(5), 603–619. doi:10.1109/34.1000236.
Article Google Scholar
Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(8), 888–905. doi:10.1109/34.868688.
Article Google Scholar
Tien-Ying, K., & Yi-Chung, L. (2011). Depth estimation from a monocular view of the outdoors. Consumer Electronics, IEEE Transactions on, 57(2), 817–822. doi:10.1109/TCE.2011.5955227.
Article Google Scholar
Battiato, S., Curti, S., La Cascia, M., Tortora, M., & Scordato, E. (2004). Depth-map generation by image classification. In Three-Dimensional Image Capture and Applications VI, January 19, 2004 - January 20, 2004, San Jose, CA, United states, (Vol. 5302, pp. 95–104, Proceedings of SPIE - The International Society for Optical Engineering): SPIE. doi:10.1117/12.526634.
Chang, J.-Y., Cheng, C.-C., Chien, S.-Y., & Chen, L.-G. (2006). Relative depth layer extraction for monoscopic video by use of multidimensional filter. In 2006 I.E. International Conference on Multimedia and Expo, ICME 2006, July 9, 2006 - July 12, 2006, Toronto, ON, Canada, (Vol. 2006, pp. 221–224, 2006 I.E. International Conference on Multimedia and Expo, ICME 2006 - Proceedings): Inst. of Elec. and Elec. Eng. Computer Society. doi:10.1109/ICME.2006.262422.
Moustakas, K., Tzovaras, D., & Strintzis, M. G. (2005). Stereoscopic video generation based on efficient layered structure and motion estimation from a monoscopic image sequence. IEEE Transactions on Circuits and Systems for Video Technology, 15(8), 1065–1073. doi:10.1109/TCSVT.2005.852401.
Article Google Scholar
Gonzalez, Rafael C., Woods, Richard E. (2002). Digital Image Processing (2nd edition).
Daugman, J. G. (1985). Uncertainty relation for resolution in space, spatial-frequency, and orientation optimized by two-dimensional visual cortical filters. Journal of the Optical Society of America a-Optics Image Science and Vision, 2(7), 1160–1169. doi:10.1364/josaa.2.001160.
Article Google Scholar
Brodatz, P. (1966). Textures: a photographic album for artists and designers vol. 66: Dover New York.
Gwo Giun, L., Ming-Jiun, W., He-Yuan, L., Drew Wei-Chi, S., & Bo-Yun, L. (2007). Algorithm/architecture Co-design of 3-D spatio-temporal motion estimation for video coding. Multimedia, IEEE Transactions on, 9(3), 455–465. doi:10.1109/TMM.2006.889355.
Article Google Scholar
Lee, G. G., Chen, C. F., Hsiao, C. J., & Wu, J. C. (2014). Bi-directional trajectory tracking with variable block-size motion estimation for frame rate Up-convertor. Emerging and Selected Topics in Circuits and Systems, IEEE Journal on, 4(1), 29–42. doi:10.1109/JETCAS.2014.2298923.
Article Google Scholar
Meesters, L. M. J., Ijsselsteijn, W. A., & Seuntiens, P. J. H. (2004). A survey of perceptual evaluations and requirements of three-dimensional TV. Circuits and Systems for Video Technology, IEEE Transactions on, 14(3), 381–391. doi:10.1109/TCSVT.2004.823398.
Article Google Scholar
Methodology for the subjective assessment of the quality of television pictures, ITU-R BT.500-13 http://www.itu.int/dms_pubrec/itu-r/rec/bt/R-REC-BT.500-13-201201-I!!PDF-E.pdf.
PowerDVD of CyberLink, http://www.cyberlink.com/.
TriDef 3D, http://www.ddd.com/.

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan, Republic of China
Gwo Giun (Chris) Lee, Chun-Fu Chen, He-Yuan Lin & Ming-Jiun Wang

Authors

Gwo Giun (Chris) Lee
View author publications
You can also search for this author in PubMed Google Scholar
Chun-Fu Chen
View author publications
You can also search for this author in PubMed Google Scholar
He-Yuan Lin
View author publications
You can also search for this author in PubMed Google Scholar
Ming-Jiun Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chun-Fu Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

(Chris) Lee, G.G., Chen, CF., Lin, HY. et al. 3-D Video Generation from Monocular Video Based on Hierarchical Video Segmentation. J Sign Process Syst 81, 345–358 (2015). https://doi.org/10.1007/s11265-014-0955-3

Download citation

Received: 23 April 2014
Revised: 01 September 2014
Accepted: 16 September 2014
Published: 05 October 2014
Issue Date: December 2015
DOI: https://doi.org/10.1007/s11265-014-0955-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

3-D Video Generation from Monocular Video Based on Hierarchical Video Segmentation

Abstract

Access this article

Similar content being viewed by others

Deep learning for video object segmentation: a review

Multi3D: 3D-aware multimodal image synthesis

See in 3D: state of the art of 3D display technologies

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

3-D Video Generation from Monocular Video Based on Hierarchical Video Segmentation

Abstract

Access this article

Similar content being viewed by others

Deep learning for video object segmentation: a review

Multi3D: 3D-aware multimodal image synthesis

See in 3D: state of the art of 3D display technologies

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation