Image-Based Modeling by Joint Segmentation

Quan, Long; Wang, Jingdong; Tan, Ping; Yuan, Lu

doi:10.1007/s11263-007-0044-1

Image-Based Modeling by Joint Segmentation

Published: 15 March 2007

Volume 75, pages 135–150, (2007)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Long Quan¹,
Jingdong Wang¹,
Ping Tan¹ &
…
Lu Yuan¹

327 Accesses
22 Citations
Explore all metrics

Abstract

The paper first traces the image-based modeling back to feature tracking and factorization that have been developed in the group led by Kanade since the eighties. Both feature tracking and factorization have inspired and motivated many important algorithms in structure from motion, 3D reconstruction and modeling. We then revisit the recent quasi-dense approach to structure from motion. The key advantage of the quasi-dense approach is that it not only delivers the structure from motion in a robust manner for practical modeling purposes, but also it provides a cloud of sufficiently dense 3D points that allows the objects to be explicitly modeled. To structure the available 3D points and registered 2D image information, we argue that a joint segmentation of both 3D and 2D is the fundamental stage for the subsequent modeling. We finally propose a probabilistic framework for the joint segmentation. The optimal solution to such a joint segmentation is still generally intractable, but approximate solutions are developed in this paper. These methods are implemented and validated on real data set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Blake, A., Rother, C., Brown, M., Pérez, P., and Torr, P.H.S. 2004. Interactive image segmentation using an adaptive GMMRF model. In ECCV (1), pp. 428–441.
Boykov, Y. and Jolly, M. 2001. Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In ICCV, pp. 105–112,
Boykov, Y., Veksler, O., and Zabih, R. 2001. Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell., 23(11):1222–1239.
Article Google Scholar
Criminisi, A., Cross, G., Blake, A., and Kolmogorov, V. 2006. Bilayer segmentation of live video. In CVPR.
Dempster, A.P., Laird, N.M., and Rubin, D.B. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B:1–38.
Faugeras, O. 1992. What can be seen in three dimensions with an uncalibrated stereo rig? In Sandini, G. (Ed.), In Proceedings of the 2nd European Conference on Computer Vision, Santa Margherita Ligure, Italy, pp. 563–578. Springer-Verlag
Faugeras, O., Luong, Q., and Papadopoulo, T. 2001. The geometry of multiple images. The MIT Press, Cambridge, MA, USA.
MATH Google Scholar
Förstner, W. 1994. A framework for low level feature extraction. In Proceedings of the 3rd European Conference on Computer Vision, Stockholm, Sweden, pp. 383–394.
Fua, P. 1991. Combining stereo and monocular information to compute dense depth maps that preserve discontinuities. In Proceedings of the 12th International Joint Conference on Artificial Intelligence, Sydney, Australia.
Gargallo, P. and Sturm, P. 2005. Bayesian 3D Modeling from Images Using Multiple Depth Maps. In CVPR (2), pp. 885–891,
Harris, C. and Stephens, M. 1988. A combined corner and edge detector. In Alvey Vision Conference, pp. 147–151.
Hartley, R.I. 1992. Estimation of relative camera positions for uncalibrated cameras. In Sandini, G. (Ed.), In Proceedings of the 2nd European Conference on Computer Vision, Santa Margherita Ligure, Italy, pp. 579–587, Springer-Verlag.
Hartley, R.I. and Zisserman, A. 2000. Multiple View Geometry in Computer Vision. Cambridge University Press.
Urban, M., Matas, J., Chum, O., and Pajdla, T. 2002. Robust wide baseline stereo from maximally stable extremal regions. In British Machine Vision Conference, pp. 384–393.
Koenderink, J.J. and van Doorn, A.J. 1989. Affine structure from motion. Technical report, Utrecht University, Utrecht, The Netherlands, also appeared in Journal of the Optical Society of America A, 8(2):377–385, 1991.
Kolmogorov, V., Criminisi, A., Blake, A., Cross, G., and Rother, C. 2005. Bi-Layer segmentation of binocular stereo video. In CVPR (2), pp. 407–414.
Kolmogorov, V. and Zabih, R. 2002. Multi-camera scene reconstruction via graph cuts. In ECCV (3), pp. 82–96.
Lafferty, J.D., McCallum, A., and Pereira, F.C.N. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML, pp. 282–289.
Laveau, S. 1996. Géométrie d'un système de $N$ caméras. Théorie, estimation, et applications. Thèse de doctorat, École Polytechnique.
Lhuillier, M. 1998. Efficient dense matching for textured scenes using region growing. In Proceedings of the ninth British Machine Vision Conference, Southampton, England, pp. 700–709.
Lhuillier, M. and Quan, L. 2002. Image-based rendering by match propagation and joint view triangualtion. IEEE Transactions on Pattern Analysis and Machine Intelligence}, 24(8):1140–1146.
Article Google Scholar
Lhuillier, M. and Quan, L. 2005. A quasi-dense approach to surface reconstruction from uncalibrated images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(3):418–433.
Article Google Scholar
Li, Y., Sun, J., Tang, C., and Shum, H. 2004. Lazy snapping. In Proceedings of ACM SIGGRAPH., pp. 303–308.
Lowe, D. 2004. Distinctive image feature from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–110.
Article Google Scholar
Lucas, B.D. and Kanade, T. 1981. An iterative image registration technique with an application to stereo vision. In Proceedings of the 7th International Joint Conference on Artificial Intelligence.
Malik, J., Belongie, S., Leung, T.K., and Shi, J. 2001. Contour and texture analysis for image segmentation. International Journal of Computer Vision, 43(1):7–27.
Article MATH Google Scholar
Mohr, R., Quan, L., and Veillon, F. 1995. Relative 3D reconstruction using multiple uncalibrated images. International Journal of Robotic Research, 14(6):619–632.
Article Google Scholar
Mohr, R., Quan, L., Veillon, F., and Boufama, B. 1992. Relative 3D reconstruction using multiple uncalibrated images. Technical Report RT 84-I-IMAG LIFIA 12, LIFIA–IRIMAG.
Moravec, H. 1979. Visual mapping by a robot rover. In Proceedings of the 6th International Joint Conference on Artificial Intelligence, Tokyo, Japan, pp. 598–600.
Moravec, H. 1981. Obstable avoidance and navigation in the real world by a seeing robot rover. Technical report CMU-RI-tr-3, Carnegie Mellon University.
Nister, D. 2001. Automatic Dense Reconstruction from Uncalibrated Video Sequences. Ph.d. thesis, NADA, KTH, Sweden.
Patras, I., Hendriks, E.A., and Lagendijk, R.L. 2001. Video segmentation by MAP labeling of watershed segments. IEEE Trans. Pattern Anal. Mach. Intell., 23(3):326–332.
Article Google Scholar
Pollard, S.B., Mayhew, J.E.W., and Frisby, J.P. 1985. PMF: a stereo correspondence algorithm using a disparity gradient limit. Perception. 14, pp. 449–470.
Article Google Scholar
Pollefeys, M., Koch, R., and Van Gool, L. 1998. Self-calibration and metric reconstruction in spite of varying and unknown internal camera parameters. In Proceedings of the 6th International Conference on Computer Vision, Bombay, India, pp. 90–95.
Quan, L. 1995. Invariants of six points and projective reconstruction from three uncalibrated images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(1):34–46.
Article Google Scholar
Quan, L., Tan, P., Zeng, G., Yuan, L., Wang, J., and Kang, S.B. 2006. Image-based plant modeling. In Proceedings of ACM SIGGRAPH.
Rother, C., Kolmogorov, V., and Blake, A. 2004. GrabCut: interactive foreground extraction using iterated graph cuts. In Proceedings of ACM SIGGRAPH., pp. 309–314.
Shi, J. and Malik, J. 2000. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 22(8):888–905.
Article Google Scholar
Shi, J. and Tomasi, C. 1994. Good features to track. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Seattle, Washington, USA, pp. 593–600.
Sturm, P. and Triggs, B. 1996. A factorization based algorithm for multi-image projective structure and motion. In B. Buxton and R. Cipolla, editors, Proceedings of the 4th European Conference on Computer Vision, Cambridge, England, volume 1065 of Lecture Notes in Computer Science, pp. 709–720. Springer-Verlag.
Sun, J., Zhang, W., Tang, X., and Shum, H. 2006. Background Cut. In ECCV.
Tanner, M.A. and Wong, W.H. 1987. The calculation of posterior distributions by data augmentation (with discussion). In Journal of the American Statistical Association, 82, 528–550.
Article MATH Google Scholar
Tipping, M.E. and Bishop, C.M. 1999. Mixtures of probabilistic principal component analysers. Neural Computation, 11(2):443–482.
Article Google Scholar
Tomasi, C. 1991. Shape and Motion from Image Streams: a Factorization Method. PhD thesis, Carnegie Mellon University, USA.
Tomasi, C. and Kanade, T. 1991. Detection and tracking of point features. Technical report CMU-CS-91-132, Carnegie Mellon University.
Tomasi, C. and Kanade, T. 1992. Shape and motion from image streams under orthography: A factorization method. International Journal of Computer Vision, 9(2):137–154.
Article Google Scholar
Torr, P.H.S., Szeliski, R., and Anandan, P. 2001. An integrated bayesian approach to layer extraction from image sequences. IEEE Trans. Pattern Anal. Mach. Intell., 23(3):297–303.
Article Google Scholar
Triggs, B. 1996. Factorization methods for projective structure and motion. In Proceedings of the Conference on Computer Vision and Pattern Recognition, San Francisco, California, USA, pp. 845–851.
Triggs, B. 1997. Autocalibration and the absolute quadric. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Puerto Rico, USA, pp. 609–614. IEEE Computer Society Press.
Triggs, B. 2004. Detecting keypoints wiht stable position, orientation and scale under illumination changes. In European Conference on Computer Vision. Springer-Verlag.
Triggs, B., McLauchlan, P.F., Hartley, R.I., and Fitzgibbon, A. 2000. Bundle ajustment—a modern synthesis. In Triggs, B., Zisserman, A., and Szeliski, R. (Eds.), Vision Algorithms: Theory and Practice, volume 1883 of Lecture Notes in Computer Science, pp. 298–372. Springer-Verlag.
Tu, Z. and Zhu, S.C. 2002. Image segmentation by data-driven Markov chain Monte Carlo. IEEE Trans. Pattern Anal. Mach. Intell., 24(5):657–673.
Article Google Scholar
Tuytelaars, T. and Van Gool, 2000. Wide baseline stereo based on local, affinely invariant regions. In British Machine Vision Conference, pp. 412–422.
Wang, J.Y.A., and Adelson, E.H. 1994. Representing moving images with layers. IEEE Transactions on Image Processing, 3(5):625–638.
Article Google Scholar
Wei, Y., Ofek, E., Quan, L., and Shum, H. 2005. Modeling hair from multiple views. ACM Transactions on Graphics (TOG), Proceedings of ACM SIGGRAPH 2005 (SIGGRAPH), vol. 27, no. 3.
Wills, J., Agarwal, S., and Belongie, S. 2003. What went where. In CVPR (1), pp. 37–44,
Xiao, J. and Shah, M. 2004. Motion layer extraction in the presence of occlusion using graph cut. In CVPR (2), pp. 972–979.
Zabih, R. and Kolmogorov, V. 2004. Spatially coherent clustering using graph cuts. In CVPR (2), pp. 437–444.
Zeng, G., Paris, S., Quan, L., and Sillion, F. to appear. Accurate and scalable surface representation and reconstruction from images. IEEE Transaction on Pattern Analysis and Machine Intelligence, (IEEE TPAMI).
Zhang, Z., Deriche, R., Faugeras, O.D., and Luong, Q.T. 1995. A robust technique for matching two uncalibrated images through the recovery of the unknown epipolar geometry. Artificial Intelligence, 78(1–2):87–120. Appeared in October 1995, also INRIA Research Report No.2273, May 1994.
Google Scholar
Zhu, X. and Lafferty, J. 2005. Harmonic mixtures: combining mixture models and graph-based methods for inductive and scalable semi-supervised learning. In ICML, pp. 1052–1059.

Download references

Author information

Authors and Affiliations

The Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
Long Quan, Jingdong Wang, Ping Tan & Lu Yuan

Authors

Long Quan
View author publications
You can also search for this author in PubMed Google Scholar
Jingdong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ping Tan
View author publications
You can also search for this author in PubMed Google Scholar
Lu Yuan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Long Quan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Quan, L., Wang, J., Tan, P. et al. Image-Based Modeling by Joint Segmentation. Int J Comput Vis 75, 135–150 (2007). https://doi.org/10.1007/s11263-007-0044-1

Download citation

Received: 10 May 2006
Accepted: 14 February 2007
Published: 15 March 2007
Issue Date: October 2007
DOI: https://doi.org/10.1007/s11263-007-0044-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Image-Based Modeling by Joint Segmentation

Abstract

Access this article

Similar content being viewed by others

Simultaneous Monocular 2D Segmentation, 3D Pose Recovery and 3D Reconstruction

2D or Not 2D: Bridging the Gap Between Tracking and Structure from Motion

PoseField: An Efficient Mean-Field Based Method for Joint Estimation of Human Pose, Segmentation, and Depth

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Image-Based Modeling by Joint Segmentation

Abstract

Access this article

Similar content being viewed by others

Simultaneous Monocular 2D Segmentation, 3D Pose Recovery and 3D Reconstruction

2D or Not 2D: Bridging the Gap Between Tracking and Structure from Motion

PoseField: An Efficient Mean-Field Based Method for Joint Estimation of Human Pose, Segmentation, and Depth

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation