Skip to main content
Log in

3-D Video Generation from Monocular Video Based on Hierarchical Video Segmentation

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

This paper presents an algorithm that is capable of generating 3-D video from monocular video through a hierarchical approach that characterizes the low-level features and high-level semantics of video content to generate depth map. Color and texture features are used to locally indicate video’s characteristics followed by segmentation of the video into several regions. Subsequently, semantics, including shape and motion semantics, of segmented regions are delineated from a higher perspective to refine segmentation result via measuring the interrelations among segmented regions. Subsequently, according to refined segmentation map and semantics of segmented regions, the proposed method generates a depth map with high stability using both spatial and temporal information. The stable depth map minimized visual quality degradation, such as flicker and blurring when viewing the 3-D video. The experimental results show the capability of the proposed algorithm in generating a high quality and stable depth map. In addition, subjective viewing evaluation shows that the proposed algorithm surpasses those of the commercial products for converting 2-D video to 3-D video, including TriDef 3D and CyberLink Power DVD.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Tanimoto, M., Tehrani, M. P., Fujii, T., & Yendo, T. (2011). Free-viewpoint TV. Signal Processing Magazine, IEEE, 28(1), 67–76. doi:10.1109/MSP.2010.939077.

    Article  Google Scholar 

  2. ISO/IEC JTC1/SC29/WG11. (2011). Applications and Requirements on 3D Video Coding, Doc. N12035, Geneva, Switzerland, March.

  3. Hartlety, R.-I., & Zisserman, A. (2004). Multiple view geometry in computer vision (Second editionth ed.). Cambridge: Cambridge University Press.

    Book  Google Scholar 

  4. Fehn, C. (2004). Depth-image-based rendering (DIBR), compression and transmission for a new approach on 3D-TV. In Stereoscopic Displays and Virtual Reality Systems XI, January 19, 2004 - January 21, San Jose, CA, United states, 2004 (Vol. 5291, pp. 93–104, Proceedings of SPIE - The International Society for Optical Engineering): SPIE. doi:10.1117/12.524762.

  5. Li, S., Wang, F., & Liu, W. (2010). The overview of 2D to 3D conversion system. In Computer-Aided Industrial Design & Conceptual Design (CAIDCD), 2010 I.E. 11th International Conference on, 17–19 Nov. 2010 (Vol. 2, pp. 1388–1392). doi:10.1109/CAIDCD.2010.5681951.

  6. Zhebin, Z., Yizhou, W., Tingting, J., & Wen, G. (2011). Visual pertinent 2D-to-3D video conversion by multi-cue fusion. In Image Processing (ICIP), 2011 18th IEEE International Conference on, 11–14 Sept. 2011 (pp. 909–912). doi:10.1109/ICIP.2011.6116707.

  7. Guofeng, Z., Wei, H., Xueying, Q., Tien-Tsin, W., & Bao, H. (2007). Stereoscopic video synthesis from a monocular video. Visualization and Computer Graphics, IEEE Transactions on, 13(4), 686–696. doi:10.1109/TVCG.2007.1032.

    Article  Google Scholar 

  8. Ideses, I., Yaroslavsky, L., & Fishbain, B. (2007). Real-time 2D to 3D video conversion. Journal of Real-Time Image Processing, 2(1), 3–9. doi:10.1007/s11554-007-0038-9.

    Article  Google Scholar 

  9. Xiaojun, H., Lianghao, W., Junjun, H., Dongxiao, L., & Ming, Z. A. (2009). Depth Extraction Method Based on Motion and Geometry for 2D to 3D Conversion. In Intelligent Information Technology Application, 2009. IITA 2009. Third International Symposium on, 21–22 Nov. 2009 (Vol. 3, pp. 294–298). doi:10.1109/IITA.2009.481

  10. Schnyder, L., Wang, O., & Smolic, A. (2011). 2D to 3D conversion of sports content using panoramas. In Image Processing (ICIP), 2011 18th IEEE International Conference on, 11–14 Sept. 2011 (pp. 1961–1964). doi:10.1109/ICIP.2011.6115857.

  11. Donghyun, K., Dongbo, M., & Kwanghoon, S. (2008). A stereoscopic video generation method using stereoscopic display characterization and motion analysis. Broadcasting, IEEE Transactions on, 54(2), 188–197. doi:10.1109/TBC.2007.914714.

    Article  Google Scholar 

  12. Saxena, A., Min, S., & Ng, A. Y. (2007). Learning 3-D Scene Structure from a Single Still Image. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, 14–21 Oct. 2007 (pp. 1–8). doi:10.1109/ICCV.2007.4408828.

  13. Tam, W. J., & Liang, Z. (2006). 3D-TV Content Generation: 2D-to-3D Conversion. In Multimedia and Expo, 2006 I.E. International Conference on, 9–12 July 2006 (pp. 1869–1872). doi:10.1109/ICME.2006.262919.

  14. Rajagopalan, A. N., Chaudhuri, S., & Uma, M. (2004). Depth estimation and image restoration using defocused stereo pairs. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 26(11), 1521–1525. doi:10.1109/TPAMI.2004.102.

    Article  Google Scholar 

  15. Jahne, B., & Geissler, P. (1994). Depth from focus with one image. In Proceedings of the 1994 I.E. Computer Society Conference on Computer Vision and Pattern Recognition, June 21, 1994 - June 23, 1994, Seattle, WA, USA, (pp. 713–717, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition): Publ by IEEE. doi:10.1109/CVPR.1993.341015.

  16. Xun, C., Zheng, L., & Qionghai, D. (2011). Semi-automatic 2D-to-3D conversion using disparity propagation. Broadcasting, IEEE Transactions on, 57(2), 491–499. doi:10.1109/TBC.2011.2127650.

    Article  Google Scholar 

  17. Xun, C., Bovik, A. C., Yao, W., & Qionghai, D. (2011). Converting 2D video to 3D: an efficient path to a 3D experience. MultiMedia, IEEE, 18(4), 12–17. doi:10.1109/MMUL.2011.65.

    Article  Google Scholar 

  18. Yue, F., Jinchang, R., & Jianmin, J. (2011). Object-based 2D-to-3D video conversion for effective stereoscopic content generation in 3D-TV applications. Broadcasting, IEEE Transactions on, 57(2), 500–509. doi:10.1109/TBC.2011.2131030.

    Article  Google Scholar 

  19. Chang, Y.-L., Fang, C.-Y., Ding, L.-F., Chen, S.-Y., & Chen, L.-G. (2007). Depth map generation for 2D-TO-3D conversion by short-term motion assisted color segmentation. In IEEE International Conference on Multimedia and Expo, ICME 2007, July 2, 2007 - July 5, 2007, Beijing, China, (pp. 1958–1961, Proceedings of the 2007 I.E. International Conference on Multimedia and Expo, ICME 2007): Inst. of Elec. and Elec. Eng. Computer Society.

  20. Guo, G., Zhang, N., Huo, L., & Gao, W. (2008). 2D TO 3D convertion based on edge defocus and segmentation. In 2008 I.E. International Conference on Acoustics, Speech and Signal Processing, ICASSP, March 31, 2008 - April 4, 2008, Las Vegas, NV, United states, (pp. 2181–2184, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings): Institute of Electrical and Electronics Engineers Inc. doi:10.1109/ICASSP.2008.4518076.

  21. Lloyd, S. (1982). Least squares quantization in PCM. Information Theory, IEEE Transactions on, 28(2), 129–137. doi:10.1109/TIT.1982.1056489.

    Article  MATH  MathSciNet  Google Scholar 

  22. Comaniciu, D., & Meer, P. (2002). Mean shift: a robust approach toward feature space analysis. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24(5), 603–619. doi:10.1109/34.1000236.

    Article  Google Scholar 

  23. Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(8), 888–905. doi:10.1109/34.868688.

    Article  Google Scholar 

  24. Tien-Ying, K., & Yi-Chung, L. (2011). Depth estimation from a monocular view of the outdoors. Consumer Electronics, IEEE Transactions on, 57(2), 817–822. doi:10.1109/TCE.2011.5955227.

    Article  Google Scholar 

  25. Battiato, S., Curti, S., La Cascia, M., Tortora, M., & Scordato, E. (2004). Depth-map generation by image classification. In Three-Dimensional Image Capture and Applications VI, January 19, 2004 - January 20, 2004, San Jose, CA, United states, (Vol. 5302, pp. 95–104, Proceedings of SPIE - The International Society for Optical Engineering): SPIE. doi:10.1117/12.526634.

  26. Chang, J.-Y., Cheng, C.-C., Chien, S.-Y., & Chen, L.-G. (2006). Relative depth layer extraction for monoscopic video by use of multidimensional filter. In 2006 I.E. International Conference on Multimedia and Expo, ICME 2006, July 9, 2006 - July 12, 2006, Toronto, ON, Canada, (Vol. 2006, pp. 221–224, 2006 I.E. International Conference on Multimedia and Expo, ICME 2006 - Proceedings): Inst. of Elec. and Elec. Eng. Computer Society. doi:10.1109/ICME.2006.262422.

  27. Moustakas, K., Tzovaras, D., & Strintzis, M. G. (2005). Stereoscopic video generation based on efficient layered structure and motion estimation from a monoscopic image sequence. IEEE Transactions on Circuits and Systems for Video Technology, 15(8), 1065–1073. doi:10.1109/TCSVT.2005.852401.

    Article  Google Scholar 

  28. Gonzalez, Rafael C., Woods, Richard E. (2002). Digital Image Processing (2nd edition).

  29. Daugman, J. G. (1985). Uncertainty relation for resolution in space, spatial-frequency, and orientation optimized by two-dimensional visual cortical filters. Journal of the Optical Society of America a-Optics Image Science and Vision, 2(7), 1160–1169. doi:10.1364/josaa.2.001160.

    Article  Google Scholar 

  30. Brodatz, P. (1966). Textures: a photographic album for artists and designers vol. 66: Dover New York.

  31. Gwo Giun, L., Ming-Jiun, W., He-Yuan, L., Drew Wei-Chi, S., & Bo-Yun, L. (2007). Algorithm/architecture Co-design of 3-D spatio-temporal motion estimation for video coding. Multimedia, IEEE Transactions on, 9(3), 455–465. doi:10.1109/TMM.2006.889355.

    Article  Google Scholar 

  32. Lee, G. G., Chen, C. F., Hsiao, C. J., & Wu, J. C. (2014). Bi-directional trajectory tracking with variable block-size motion estimation for frame rate Up-convertor. Emerging and Selected Topics in Circuits and Systems, IEEE Journal on, 4(1), 29–42. doi:10.1109/JETCAS.2014.2298923.

    Article  Google Scholar 

  33. Meesters, L. M. J., Ijsselsteijn, W. A., & Seuntiens, P. J. H. (2004). A survey of perceptual evaluations and requirements of three-dimensional TV. Circuits and Systems for Video Technology, IEEE Transactions on, 14(3), 381–391. doi:10.1109/TCSVT.2004.823398.

    Article  Google Scholar 

  34. Methodology for the subjective assessment of the quality of television pictures, ITU-R BT.500-13 http://www.itu.int/dms_pubrec/itu-r/rec/bt/R-REC-BT.500-13-201201-I!!PDF-E.pdf.

  35. PowerDVD of CyberLink, http://www.cyberlink.com/.

  36. TriDef 3D, http://www.ddd.com/.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chun-Fu Chen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

(Chris) Lee, G.G., Chen, CF., Lin, HY. et al. 3-D Video Generation from Monocular Video Based on Hierarchical Video Segmentation. J Sign Process Syst 81, 345–358 (2015). https://doi.org/10.1007/s11265-014-0955-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-014-0955-3

Keywords

Navigation