Abstract
This paper presents an algorithm that is capable of generating 3-D video from monocular video through a hierarchical approach that characterizes the low-level features and high-level semantics of video content to generate depth map. Color and texture features are used to locally indicate video’s characteristics followed by segmentation of the video into several regions. Subsequently, semantics, including shape and motion semantics, of segmented regions are delineated from a higher perspective to refine segmentation result via measuring the interrelations among segmented regions. Subsequently, according to refined segmentation map and semantics of segmented regions, the proposed method generates a depth map with high stability using both spatial and temporal information. The stable depth map minimized visual quality degradation, such as flicker and blurring when viewing the 3-D video. The experimental results show the capability of the proposed algorithm in generating a high quality and stable depth map. In addition, subjective viewing evaluation shows that the proposed algorithm surpasses those of the commercial products for converting 2-D video to 3-D video, including TriDef 3D and CyberLink Power DVD.













Similar content being viewed by others
References
Tanimoto, M., Tehrani, M. P., Fujii, T., & Yendo, T. (2011). Free-viewpoint TV. Signal Processing Magazine, IEEE, 28(1), 67–76. doi:10.1109/MSP.2010.939077.
ISO/IEC JTC1/SC29/WG11. (2011). Applications and Requirements on 3D Video Coding, Doc. N12035, Geneva, Switzerland, March.
Hartlety, R.-I., & Zisserman, A. (2004). Multiple view geometry in computer vision (Second editionth ed.). Cambridge: Cambridge University Press.
Fehn, C. (2004). Depth-image-based rendering (DIBR), compression and transmission for a new approach on 3D-TV. In Stereoscopic Displays and Virtual Reality Systems XI, January 19, 2004 - January 21, San Jose, CA, United states, 2004 (Vol. 5291, pp. 93–104, Proceedings of SPIE - The International Society for Optical Engineering): SPIE. doi:10.1117/12.524762.
Li, S., Wang, F., & Liu, W. (2010). The overview of 2D to 3D conversion system. In Computer-Aided Industrial Design & Conceptual Design (CAIDCD), 2010 I.E. 11th International Conference on, 17–19 Nov. 2010 (Vol. 2, pp. 1388–1392). doi:10.1109/CAIDCD.2010.5681951.
Zhebin, Z., Yizhou, W., Tingting, J., & Wen, G. (2011). Visual pertinent 2D-to-3D video conversion by multi-cue fusion. In Image Processing (ICIP), 2011 18th IEEE International Conference on, 11–14 Sept. 2011 (pp. 909–912). doi:10.1109/ICIP.2011.6116707.
Guofeng, Z., Wei, H., Xueying, Q., Tien-Tsin, W., & Bao, H. (2007). Stereoscopic video synthesis from a monocular video. Visualization and Computer Graphics, IEEE Transactions on, 13(4), 686–696. doi:10.1109/TVCG.2007.1032.
Ideses, I., Yaroslavsky, L., & Fishbain, B. (2007). Real-time 2D to 3D video conversion. Journal of Real-Time Image Processing, 2(1), 3–9. doi:10.1007/s11554-007-0038-9.
Xiaojun, H., Lianghao, W., Junjun, H., Dongxiao, L., & Ming, Z. A. (2009). Depth Extraction Method Based on Motion and Geometry for 2D to 3D Conversion. In Intelligent Information Technology Application, 2009. IITA 2009. Third International Symposium on, 21–22 Nov. 2009 (Vol. 3, pp. 294–298). doi:10.1109/IITA.2009.481
Schnyder, L., Wang, O., & Smolic, A. (2011). 2D to 3D conversion of sports content using panoramas. In Image Processing (ICIP), 2011 18th IEEE International Conference on, 11–14 Sept. 2011 (pp. 1961–1964). doi:10.1109/ICIP.2011.6115857.
Donghyun, K., Dongbo, M., & Kwanghoon, S. (2008). A stereoscopic video generation method using stereoscopic display characterization and motion analysis. Broadcasting, IEEE Transactions on, 54(2), 188–197. doi:10.1109/TBC.2007.914714.
Saxena, A., Min, S., & Ng, A. Y. (2007). Learning 3-D Scene Structure from a Single Still Image. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, 14–21 Oct. 2007 (pp. 1–8). doi:10.1109/ICCV.2007.4408828.
Tam, W. J., & Liang, Z. (2006). 3D-TV Content Generation: 2D-to-3D Conversion. In Multimedia and Expo, 2006 I.E. International Conference on, 9–12 July 2006 (pp. 1869–1872). doi:10.1109/ICME.2006.262919.
Rajagopalan, A. N., Chaudhuri, S., & Uma, M. (2004). Depth estimation and image restoration using defocused stereo pairs. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 26(11), 1521–1525. doi:10.1109/TPAMI.2004.102.
Jahne, B., & Geissler, P. (1994). Depth from focus with one image. In Proceedings of the 1994 I.E. Computer Society Conference on Computer Vision and Pattern Recognition, June 21, 1994 - June 23, 1994, Seattle, WA, USA, (pp. 713–717, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition): Publ by IEEE. doi:10.1109/CVPR.1993.341015.
Xun, C., Zheng, L., & Qionghai, D. (2011). Semi-automatic 2D-to-3D conversion using disparity propagation. Broadcasting, IEEE Transactions on, 57(2), 491–499. doi:10.1109/TBC.2011.2127650.
Xun, C., Bovik, A. C., Yao, W., & Qionghai, D. (2011). Converting 2D video to 3D: an efficient path to a 3D experience. MultiMedia, IEEE, 18(4), 12–17. doi:10.1109/MMUL.2011.65.
Yue, F., Jinchang, R., & Jianmin, J. (2011). Object-based 2D-to-3D video conversion for effective stereoscopic content generation in 3D-TV applications. Broadcasting, IEEE Transactions on, 57(2), 500–509. doi:10.1109/TBC.2011.2131030.
Chang, Y.-L., Fang, C.-Y., Ding, L.-F., Chen, S.-Y., & Chen, L.-G. (2007). Depth map generation for 2D-TO-3D conversion by short-term motion assisted color segmentation. In IEEE International Conference on Multimedia and Expo, ICME 2007, July 2, 2007 - July 5, 2007, Beijing, China, (pp. 1958–1961, Proceedings of the 2007 I.E. International Conference on Multimedia and Expo, ICME 2007): Inst. of Elec. and Elec. Eng. Computer Society.
Guo, G., Zhang, N., Huo, L., & Gao, W. (2008). 2D TO 3D convertion based on edge defocus and segmentation. In 2008 I.E. International Conference on Acoustics, Speech and Signal Processing, ICASSP, March 31, 2008 - April 4, 2008, Las Vegas, NV, United states, (pp. 2181–2184, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings): Institute of Electrical and Electronics Engineers Inc. doi:10.1109/ICASSP.2008.4518076.
Lloyd, S. (1982). Least squares quantization in PCM. Information Theory, IEEE Transactions on, 28(2), 129–137. doi:10.1109/TIT.1982.1056489.
Comaniciu, D., & Meer, P. (2002). Mean shift: a robust approach toward feature space analysis. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24(5), 603–619. doi:10.1109/34.1000236.
Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(8), 888–905. doi:10.1109/34.868688.
Tien-Ying, K., & Yi-Chung, L. (2011). Depth estimation from a monocular view of the outdoors. Consumer Electronics, IEEE Transactions on, 57(2), 817–822. doi:10.1109/TCE.2011.5955227.
Battiato, S., Curti, S., La Cascia, M., Tortora, M., & Scordato, E. (2004). Depth-map generation by image classification. In Three-Dimensional Image Capture and Applications VI, January 19, 2004 - January 20, 2004, San Jose, CA, United states, (Vol. 5302, pp. 95–104, Proceedings of SPIE - The International Society for Optical Engineering): SPIE. doi:10.1117/12.526634.
Chang, J.-Y., Cheng, C.-C., Chien, S.-Y., & Chen, L.-G. (2006). Relative depth layer extraction for monoscopic video by use of multidimensional filter. In 2006 I.E. International Conference on Multimedia and Expo, ICME 2006, July 9, 2006 - July 12, 2006, Toronto, ON, Canada, (Vol. 2006, pp. 221–224, 2006 I.E. International Conference on Multimedia and Expo, ICME 2006 - Proceedings): Inst. of Elec. and Elec. Eng. Computer Society. doi:10.1109/ICME.2006.262422.
Moustakas, K., Tzovaras, D., & Strintzis, M. G. (2005). Stereoscopic video generation based on efficient layered structure and motion estimation from a monoscopic image sequence. IEEE Transactions on Circuits and Systems for Video Technology, 15(8), 1065–1073. doi:10.1109/TCSVT.2005.852401.
Gonzalez, Rafael C., Woods, Richard E. (2002). Digital Image Processing (2nd edition).
Daugman, J. G. (1985). Uncertainty relation for resolution in space, spatial-frequency, and orientation optimized by two-dimensional visual cortical filters. Journal of the Optical Society of America a-Optics Image Science and Vision, 2(7), 1160–1169. doi:10.1364/josaa.2.001160.
Brodatz, P. (1966). Textures: a photographic album for artists and designers vol. 66: Dover New York.
Gwo Giun, L., Ming-Jiun, W., He-Yuan, L., Drew Wei-Chi, S., & Bo-Yun, L. (2007). Algorithm/architecture Co-design of 3-D spatio-temporal motion estimation for video coding. Multimedia, IEEE Transactions on, 9(3), 455–465. doi:10.1109/TMM.2006.889355.
Lee, G. G., Chen, C. F., Hsiao, C. J., & Wu, J. C. (2014). Bi-directional trajectory tracking with variable block-size motion estimation for frame rate Up-convertor. Emerging and Selected Topics in Circuits and Systems, IEEE Journal on, 4(1), 29–42. doi:10.1109/JETCAS.2014.2298923.
Meesters, L. M. J., Ijsselsteijn, W. A., & Seuntiens, P. J. H. (2004). A survey of perceptual evaluations and requirements of three-dimensional TV. Circuits and Systems for Video Technology, IEEE Transactions on, 14(3), 381–391. doi:10.1109/TCSVT.2004.823398.
Methodology for the subjective assessment of the quality of television pictures, ITU-R BT.500-13 http://www.itu.int/dms_pubrec/itu-r/rec/bt/R-REC-BT.500-13-201201-I!!PDF-E.pdf.
PowerDVD of CyberLink, http://www.cyberlink.com/.
TriDef 3D, http://www.ddd.com/.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
(Chris) Lee, G.G., Chen, CF., Lin, HY. et al. 3-D Video Generation from Monocular Video Based on Hierarchical Video Segmentation. J Sign Process Syst 81, 345–358 (2015). https://doi.org/10.1007/s11265-014-0955-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-014-0955-3