Abstract
We present a new optimization based parsing framework for the geometric analysis of a single image coming from a man-made environment. This framework models the scene as a composition of geometric primitives spanning different layers from low level (edges) through mid-level (lines segments, lines and vanishing points) to high level (the zenith and the horizon). The inference in such a model thus jointly and simultaneously estimates (a) the grouping of edges into the line segments, (b) the grouping of line segments into the straight lines, (c) the grouping of lines into parallel families, and (d) the positioning of the horizon and the zenith in the image. Such a unified treatment means that the uncertainty information propagates between the layers of the model. This is in contrast to most previous approaches to the same problem, which either ignore the middle levels (line segments or lines) all together, or use the bottom-up step-by-step pipeline.
For the evaluation, we consider a publicly available York Urban dataset of “Manhattan” scenes, and also introduce a new, harder dataset of 103 urban outdoor images containing many non-Manhattan scenes. The comparative evaluation for the horizon estimation task demonstrate higher accuracy and robustness attained by our method when compared to the current state-of-the-art approaches.
Similar content being viewed by others
References
Aguilera, D. G., Lahoz, J. G., & Codes, J. F. (2005). A new method for vanishing points detection in 3d reconstruction from a single view. In Proc. of ISPRS Commission V.
Almansa, A., Desolneux, A., & Vamech, S. (2003). Vanishing point detection without any a priori information. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(4), 502–507.
Antone, M. E., & Teller, S. J. (2000). Automatic recovery of relative camera rotations for urban scenes. In CVPR (pp. 2282–2289).
Barinova, O., Lempitsky, V., & Kohli, P. (2010a). On detection of multiple object instances using hough transforms. In CVPR.
Barinova, O., Lempitsky, V., Tretiak, E., & Kohli, P. (2010b). Geometric image parsing in man-made environments. In ECCV.
Barnard, S. (1983). Interpreting perspective images. Artificial Intelligence, 21(4), 435–462.
Beardsley, P. Murray, D. (1992). Camera calibration using vanishing points. In BMVC (pp. 416–425).
Boulanger, K., Bouatouch, K., & Pattanaik, S. (2006). Atip: A tool for 3d navigation inside a single image with automatic camera calibration. In EG UK theory and practice of computer graphics.
Cipolla, R., Drummond, T., & Robertson, D. P. (1999). Camera calibration from vanishing points in image of architectural scenes. In BMVC.
Collins, R. T., & Weiss, R. S. (1990). Vanishing point calculation as a statistical inference on the unit sphere. In ICCV (pp. 400–403).
Coughlan, J. M., & Yuille, A. L. (1999). Manhattan world: Compass direction from a single image by Bayesian inference. In ICCV (pp. 941–947).
Denis, P., Elder, J. H., & Estrada, F. J. (2008). Efficient edge-based methods for estimating Manhattan frames in urban imagery. In ECCV (2) (pp. 197–210).
Deutscher, J., Isard, M., & MacCormick, J. (2002). Automatic camera calibration from a single Manhattan image. In ECCV (4) (pp. 175–205).
Duric, Z., & Rosenfeld, A. (1996). Image sequence stabilization in real time. Real-Time Imaging, 2(5), 271–284.
Flint, A., Mei, C., Reid, I., & Murray, D. (2010). Growing semantically meaningful models for visual slam. In Proc. IEEE conference on computer vision and pattern recognition (pp. 467–474). Los Alamitos: IEEE Computer Society.
Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision. Cambridge: Cambridge University Press.
Hedau, V., Hoiem, D., & Forsyth, D. (2009). Recovering the spatial layout of cluttered rooms. In ICCV (pp. 1849–1856).
Hedau, V., Hoiem, D., & Forsyth, D. (2010). Thinking outside the box: using appearance models and context based on room geometry. In ECCV (pp. 224–237).
Hoiem, D., Efros, A. A., & Hebert, M. (2005a). Automatic photo pop-up. ACM Transactions on Graphics, 24(3), 577–584.
Hoiem, D., Efros, A. A., & Hebert, M. (2005b). Geometric context from a single image. In ICCV (pp. 654–661).
Hoiem, D., Efros, A. A., & Hebert, M. (2008). Putting objects in perspective. International Journal of Computer Vision, 80(1), 3–15.
Kosecká, J., & Zhang, W. (2002). Video compass. In ECCV (4) (pp. 476–490).
Lee, D. C., Hebert, M., & Kanade, T. (2009). Geometric reasoning for single image structure recovery. In CVPR.
Lee, D. C., Gupta, A., Hebert, M., & Kanade, T. (2010). Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In NIPS.
McLean, G. F., & Kotturi, D. (1995). Vanishing point detection by line clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(11), 1090–1095.
Morel, J.-M., Randall, G., Grompone von Gioi, R., & Jakubowicz, J. (2008). Lsd: A fast line segment detector with a false detection control. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 722–732.
Rother, C. (2000). A new approach for vanishing point detection in architectural environments. In BMVC.
Schaffalitzky, F., & Zisserman, A. (2000). Planar grouping for automatic detection of vanishing lines and points. Image and Vision Computing, 18, 647–658.
Schindler, G., & Dellaert, F. (2004). Atlanta world: An expectation maximization framework for simultaneous low-level edge grouping and camera calibration in complex man-made environments. In CVPR (1) (pp. 203–209).
Tardif, J.-P. (2009). Non-iterative approach for fast and accurate vanishing point detection. In ICCV.
Tu, Z., Chen, X., Yuille, A. L., & Zhu, S. C. (2005). Image parsing: Unifying segmentation, detection, and recognition. International Journal of Computer Vision, 63(2), 113–140.
Tuytelaars, T., Van Gool, L. J., Proesmans, M., & Moons, T. (1998). A cascaded hough transform as an aid in aerial image interpretation. In ICCV (pp. 67–72).
Wildenauer, H., & Vincze, M. (2007). Vanishing point detection in complex man-made worlds. In ICIAP (pp. 615–622).
Yu, S., Zhang, H., & Malik, J. (2008). Inferring spatial layout from a single image via depth-ordered grouping. In POCV.
Author information
Authors and Affiliations
Corresponding author
Additional information
Tretyak Elena, Barinova Olga and Victor Lempitsky are supported by Microsoft Research programs in Russia. Victor Lempitsky is also supported by EU under ERC grant VisRec no. 228180.
Rights and permissions
About this article
Cite this article
Tretyak, E., Barinova, O., Kohli, P. et al. Geometric Image Parsing in Man-Made Environments. Int J Comput Vis 97, 305–321 (2012). https://doi.org/10.1007/s11263-011-0488-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-011-0488-1