Abstract
In this paper two efficient unsupervised video object segmentation approaches are proposed and thoroughly compared. Both methods are based on the exploitation of depth information, estimated from stereoscopic pairs. Depth is a more efficient semantic descriptor of visual content, since usually an object is located on one depth plane. However, depth information fails to accurately represent the contours of an object mainly due to erroneous disparity estimation and occlusion issues. For this reason, the first approach projects color segments onto depth information in order to address the limitations of both depth and color segmentation; color segmentation usually over-partitions an object into several regions, while depth fails to precisely represent object contours. Depth information is produced through an occlusion compensated disparity field and then a depth map is generated. On the contrary, color segmentation is accomplished by incorporating a modified version of the Multiresolution Recursive Shortest Spanning Tree segmentation algorithm (M-RSST). Next considering the first “Constrained Fusion of Color Segments” (CFCS) approach, a color segments map is created, by applying the M-RSST to one of the stereoscopic channels. In this case video objects are extracted by fusing color segments according to depth similarity criteria. The second method also utilizes the depth segments map. In particular an active contour is automatically initialized onto the boundary of each depth segment, which is usually different from a video object’s boundary. Initialization is accomplished by a fitness function that considers different color areas and preserves the shapes of depth segments’ boundaries. For acceleration purposes each point of the active contour is associated to an “attractive edge” point and a greedy approach is incorporated so that the active contour converges to its final position. Several experiments on real life stereoscopic sequences are performed and extensive comparisons in terms of speed and accuracy indicate the promising performance of both methods.




















Similar content being viewed by others
References
Ho, P.-G. (2011). Image Segmentation. InTech, ISBN 978-953-307-228-9.
Doulamis, N., Doulamis, A., Avrithis, Y., Ntalianis, K., & Kollias, S. (2000). Efficient summarization of stereoscopic video sequences. IEEE Transaction Circuits and Systems for Video Technology, 10(4), 501–517.
He, H. McKinnon, D. & Upcroft, B. (2011). Towards automatic object segmentation with sequential multiple views, ACRA 2011 Proceedings, Australian Robotics & Automation Association, (pp. 1–7).
Boukov,Y. & Jolly, M.-P. (2001). Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In Proc. IEEE Int. Conf. on Computer Vision.
Rother,C., Kolmogorov, V., Blake, A. (2004) GrabCut — Interactive Foreground Extraction using Iterated Graph Cuts. ACM Transactions on Graphics (SIGGRAPH).
Zhang, G., Jia, J. & Bao, H. (2011). Simultaneous Multi-Body Stereo and Segmentation. In Proc. of the 13th International Conference on Computer Vision, Barcelona, Spain, Nov.
C. Zhang, L. Wang, and R. Yang, “Semantic segmentation of urban scenes using dense depth maps,” In ECCV, p.p. 708–721, 2010
Prisacariu, V. A., & Reid, I. D. (2012). PWP3D: real-time segmentation and tracking of 3D objects. International Journal of Computer Vision, 98(3), 335–354.
S. Izadi, D. Kim, O. Hilliges, D. Molyneaux, R. Newcombe, P. Kohli, J. Shotton, S. Hodges, D. Freeman, A. Davison and A. Fitzgibbon, “KinectFusion: Real-time 3D Reconstruction and Interaction Using a Moving Depth Camera,” In Proc. of ACM UIST, p.p. 559–568, 2011
Wang, L., Zhang, C., Yang, R. & Zhang, C. (2010). TofCut: Towards Robust Real-time Foreground Extraction Using a Time-of-Flight Camera. Fifth International Symposium on 3D Data Processing, Visualization and Transmission (3DPVT).
Zhang, G., Jia, J., Hua, W., & Bao, H. (2011). Robust bilayer segmentation and motion/depth estimation with a handheld camera. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(3), 603–617.
Bleyer, M., Rother, C., Kohli, P., Scharstein, D., & Sinha, S. (2011). “Object stereo - joint stereo matching and object segmentation”, in proc. Colorado: IEEE Computer Vision & Pattern Recognition.
Guillemaut J.Y. & Hilton, A. (2011). “oint multi-layer segmentation and reconstruction for free-viewpoint video applications. International Journal of Computer Vision, (pp. 1–28).
Xiao J. & Quan, L. (2009). Multiple view semantic segmentation for street view images. In Proc. of the IEEE 12th International Conference on Computer Vision, (pp. 686–693).
Liu, Z., Shi, R., Shen, L., Xue, Y., Ngan, K. N. & Zhang, Z. (2012). Unsupervised Salient Object Segmentation Based on Kernel Density Estimation and Two-Phase Graph Cut. IEEE Trans. on Multimedia, Vol. 14, No. 4, Aug.
Zhang, G., Jia, J., Hua, W. & Bao, H. (2011). Robust Bilayer Segmentation and Motion/Depth Estimation with a Handheld Camera. IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 33, No. 3, March.
Szeliski, R. (2010). Computer Vision: Algorithms and Applications (Texts in Computer Science), Springer, Nov.
Luenberger, D. G. & Ye, Y. (2010). Linear and Nonlinear Programming (International Series in Operations Research & Management Science), Springer, Nov.
Doulamis, A. D., Doulamis, N. D., Ntalianis,K. S. & Kollias, S. D. (1999). Unsupervised Semantic Object Segmentation of Stereoscopic Video Sequences. In Proc. of the IEEE International Conference on Information, Intelligence and Systems (ICIIS), Washington D.C., U.S.A, November.
Avrithis, Y., Doulamis, A., Doulamis N. & Kollias, S. (1999). A Stochastic Framework for Optimal Key Frame Extraction from MPEG Video Databases. Computer Vision and Image Understanding, Academic Press, Vol. 75, Nos 1/2, (pp. 3–24), July/August.
Busin, L., Vandenbroucke N. & Macaire, L. (2008). Color spaces and image segmentation,” Adances in Imaging and Electron Physics, Vol. 151, Chapter 2, (pp. 65–168). Orlando, FL, USA: Elsevier Inc.. (ISSN: 1076–5670).
Kass, M., Witkin, A., & Terzopoulos, D. (1987). Snakes: active contour models. International Journal of Computer Vision, 1, 321–331.
Xu, C., & Prince, J. L. (1998). Snakes, shapes and gradient vector flow. IEEE Transaction Image Processing, 7(3), 359–369.
Amini, A. A., Tehrani, S. & Weymouth, T. E. (1988). Using Dynamic Programming for Minimizing the Energy of Active Contours in the Presence of Hard Constraints. In Proc. of the Second International Conference on Computer Vision (ICCV), (pp. 95–99).
Williams, D. J., & Shah, M. (1992). A fast algorithm for active contours and curvature estimation. GVGIP: Image Understanding, 55(1), 14–26.
Slater,J. (1996). Eye to Eye with Stereoscopic TV. Image Technology, p. 23, Nov./Dec..
Girdwood, C. & Chiwy, P. (1996). MIRAGE: An ACTS Project in Virtual Production and Stereoscopy. IBC Conference Publication, No. 428, (pp. 155–160), Sept.
Wollborn, M. & Mech, R. (1997). Procedure for Objective Evaluation of VOP Generation Algorithms. Doc. ISO/IEC JTC1/SC29/WG11 MPEG97/2704, Fribourg, Switzerland, October.
Correia P. & Pereira, F. (2000). Objective Evaluation of Relative Segmentation Quality. In Proc. International Conference on Image Processing (ICIP), Vancouver, Canada, September.
P. Villegas, X. Marichal and A. Salcedo, “Objective Evaluation of Segmentation Masks in Video Sequences”, in Proc. Of Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), Berlin, Germany, May-June 1999.
Mitiche A. & Ayed, I. B. (2010). Variational and Level Set Methods in Image Segmentation (Springer Topics in Signal Processing), Springer, Oct.
Dufaux, F., Popescu, B. P. & Cagnazzo, M. (2013). Emerging Technologies for 3D Video: Creation, Coding, Transmission and Rendering, Wiley, May.
Dhond, U. R., & Aggarwal, J. K. (1989). Structure from stereo - a review. IEEE Transactions on Systems, Man, and Cybernetics, 19(6), 1489–1510.
Liu, D., Xiong, Y., Pulli, K. & Shapiro, L. (2011). Estimating image segmentation difficulty. In Proceedings of the 7th International Conference on Machine Learning and Data Mining in Pattern Recognition, (pp. 484–495).
Acknowledgments
The authors wish to thank Dr. Chas Girdwood, the project manager of the ITC (Winchester), for providing the 3D video sequence “Eye to Eye”, which was produced in the framework of ACTS MIRAGE project. Furthermore the authors want to thank very much Dr. Siegmund Pastoor of the HHI (Berlin), for providing the video sequences of the DISTIMA project.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ntalianis, K.S., Doulamis, A.D., Doulamis, N.D. et al. Unsupervised Segmentation of Stereoscopic Video Objects: Constrained Segmentation Fusion Versus Greedy Active Contours. J Sign Process Syst 81, 153–181 (2015). https://doi.org/10.1007/s11265-014-0921-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-014-0921-0