Abstract
Locating camera position and orientation is an important step for many augmented reality (AR) applications. In this paper, we develop a system for estimating camera pose for large and outdoor environments. A large set of images for outdoor environments are collected and 3D structure of the scenes are recovered using a structure from motion technique. To improve image indexing accuracy and efficiency, a convolutional neural network (CNN) is employed to extract image features and a set of locality sensitive hashing (LSH) functions are used to classify CNN features. With these techniques, camera localization is achieved by first indexing the nearest images by CNN and LSH and then a set of 2D-3D correspondences are established from the indexed images and the recovered 3D structure. A perspective-n-point (PnP) algorithm is then applied on the 2D-3D correspondences to estimate camera pose. A series of experiments are conducted and the results confirm the effectiveness of proposed system. The nearest neighbors to query image can be accurately and efficiently extracted and the camera pose can be accurately estimated.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Durrant-Whyte, H., Bailey, T.: Simultaneous localization and mapping (SLAM): Part I the essential algorithms. IEEE Robot. Autom. Mag. 13, 99–110 (2006)
Bailey, T., Durrant-Whyte, H.: Simultaneous localization and mapping (SLAM): Part II state of the art. IEEE Robot. Autom. Mag. 13, 108–117 (2006)
Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. In: 6th IEEE and ACM International Symposium on Mixed and Augmented Reality (ISMAR 2007), pp. 225–234 (2007)
Davison, A.J., Reid, I.D., Molton, N.D., Stasse, O.: MonoSLAM: real-time single camera SLAM. IEEE Trans. Pattern Anal. Mach. Intell. 29, 1052–1067 (2007)
Ventura, J., Arth, C., Reitmayr, G., Schmalstieg, D.: Global localization from monocular SLAM on a mobile phone. IEEE Trans. Vis. Comput. Graph. 20, 531–539 (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25, pp. 1106–1114 (2012)
Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 512–519 (2014)
Xie, L., Hong, R., Zhang, B., Tian, Q.: Image classification and retrieval are ONE. In: International Conference on Multimedia Retrieval (2015)
Kato, H., Billinghurst, M.: Marker tracking and HMD calibration for a video-based augmented reality conferencing system. In: International Workshop on Augmented Reality (IWAR 1999) (1999)
Lepetit, V., Fua, P.: Monocular model-based 3D tracking of rigid objects: a survey. Found. Trends Comput. Graph. Vis. 1, 1–89 (2005)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)
Bay, H., Ess, A., Tuytelaars, T., van Gool, L.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110, 346–359 (2008)
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: International Conference on Computer Vision, pp. 2564–2571 (2011)
Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2004). ISBN 0521540518
Scaramuzza, D., Fraundorfer, F.: Visual odometry: Part I the first 30 years and fundamentals. IEEE Robot. Autom. Mag. 18, 80–92 (2011)
Scaramuzza, D., Fraundorfer, F.: Visual odometry: Part II matching, robustness, optimization, and applications. IEEE Robot. Autom. Mag. 19, 78–90 (2012)
Guan, T., Duan, L., Yu, J., Chen, Y., Zhang, X.: Real-time camera pose estimation for wide-area augmented reality applications. IEEE Comput. Graph. Appl. 31, 56–68 (2011)
Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3D. In: ACM Transactions on Graphic (SIGGRAPH 2006), vol. 25, pp. 835–846 (2006)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Technical report (2014). arXiv:1409.1556
Charikar, M.: Similarity estimation techniques from rounding algorithm. In: ACM Symposium on Theory of Computing, pp. 380–388 (2002)
Acknowledgement
This work was supported in part by the Ministry of Science and Technology, Taiwan, under Grant Nos. MOST 104-2221-E-155-032 and MOST 104-3115-E-155-002.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Teng, CH., Chen, YL., Zhang, X. (2017). Image-Based Camera Localization for Large and Outdoor Environments. In: Chen, CS., Lu, J., Ma, KK. (eds) Computer Vision – ACCV 2016 Workshops. ACCV 2016. Lecture Notes in Computer Science(), vol 10117. Springer, Cham. https://doi.org/10.1007/978-3-319-54427-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-54427-4_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54426-7
Online ISBN: 978-3-319-54427-4
eBook Packages: Computer ScienceComputer Science (R0)