LandscapeAR: Large Scale Outdoor Augmented Reality by Matching Photographs with Terrain Models Using Learned Descriptors

Brejcha, Jan; Lukáč, Michal; Hold-Geoffroy, Yannick; Wang, Oliver; Čadík, Martin

doi:10.1007/978-3-030-58526-6_18

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12374))

Included in the following conference series:

European Conference on Computer Vision

3761 Accesses
12 Citations

Abstract

We introduce a solution to large scale Augmented Reality for outdoor scenes by registering camera images to textured Digital Elevation Models (DEMs). To accommodate the inherent differences in appearance between real images and DEMs, we train a cross-domain feature descriptor using Structure From Motion (SFM) guided reconstructions to acquire training data. Our method runs efficiently on a mobile device and outperforms existing learned and hand-designed feature descriptors for this task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://cphoto.fit.vutbr.cz/LandscapeAR/.

References

Aguilera, C.A., Aguilera, F.J., Sappa, A.D., Toledo, R.: Learning cross-spectral similarity measures with deep convolutional neural networks. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 267–275 (2016). https://doi.org/10.1109/CVPRW.2016.40
Aguilera, C.A., Sappa, A.D., Aguilera, C., Toledo, R.: Cross-spectral local descriptors via quadruplet network. Sensors (Switzerland) 17(4), 1–14 (2017). https://doi.org/10.3390/s17040873
Article Google Scholar
Arandjelović, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. Arxiv (2015). http://arxiv.org/abs/1511.07247
Baboud, L., Čadík, M., Eisemann, E., Seidel, H.P.: Automatic photo-to-terrain alignment for the annotation of mountain pictures. In: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011, pp. 41–48. IEEE Computer Society, Washington (2011). https://doi.org/10.1109/CVPR.2011.5995727
Baruch, E.B., Keller, Y.: Multimodal matching using a hybrid convolutional neural network. CoRR abs/1810.12941 (2018). http://arxiv.org/abs/1810.12941
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_32
Chapter Google Scholar
Brejcha, J., Čadík, M.: GeoPose3K: mountain landscape dataset for camera pose estimation in outdoor environments. Image Vis. Comput. 66, 1–14 (2017). https://doi.org/10.1016/j.imavis.2017.05.009
Article Google Scholar
Brejcha, J., Čadík, M.: Camera orientation estimation in natural scenes using semantic cues. In: 2018 International Conference on 3D Vision (3DV), pp. 208–217, September 2018. https://doi.org/10.1109/3DV.2018.00033
Brejcha, J., Lukáč, M., Chen, Z., DiVerdi, S., Čadík, M.: Immersive trip reports. In: Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology, UIST 2018, pp. 389–401. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3242587.3242653
Čadík, M., Sýkora, D., Lee, S.: Automated outdoor depth-map generation and alignment. Elsevier Comput. Graph. 74, 109–118 (2018)
Article Google Scholar
Chen, J., Tian, J.: Real-time multi-modal rigid registration based on a novel symmetric-SIFT descriptor. Prog. Nat. Sci. 19(5), 643–651 (2009). https://doi.org/10.1016/j.pnsc.2008.06.029
Article Google Scholar
Dusmanu, M., et al.: D2-Net: a trainable CNN for joint detection and description of local features. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019. http://arxiv.org/abs/1905.03561
En, S., Lechervy, A., Jurie, F.: TS-NET: Combining modality specific and common features for multimodal patch matching. In: Proceedings - International Conference on Image Processing, ICIP, pp. 3024–3028 (2018). https://doi.org/10.1109/ICIP.2018.8451804
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Article MathSciNet Google Scholar
Georgakis, G., Karanam, S., Wu, Z., Ernst, J., Kosecka, J.: End-to-end learning of keypoint detector and descriptor for pose invariant 3D matching, February 2018. http://arxiv.org/abs/1802.07869
Georgakis, G., Karanam, S., Wu, Z., Kosecka, J.: Learning local RGB-to-CAD correspondences for object pose estimation. In: The IEEE International Conference on Computer Vision (ICCV), October 2019
Google Scholar
Harwood, B., Vijay Kumar, B.G., Carneiro, G., Reid, I., Drummond, T.: Smart mining for deep metric learning. In: Proceedings of the IEEE International Conference on Computer Vision (2017). https://doi.org/10.1109/ICCV.2017.307
Hasan, M., Pickering, M.R., Jia, X.: Modified sift for multi-modal remote sensing image registration. In: 2012 IEEE International Geoscience and Remote Sensing Symposium, pp. 2348–2351, July 2012. https://doi.org/10.1109/IGARSS.2012.6351023
Irani, M., Anandan, P.: Robust multi-sensor image alignment. In: Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271), pp. 959–966, January 1998. https://doi.org/10.1109/ICCV.1998.710832
Keller, Y., Averbuch, A.: Multisensor image registration via implicit similarity. IEEE Trans. Pattern Anal. Mach. Intell. 28(5), 794–801 (2006). https://doi.org/10.1109/TPAMI.2006.100
Article Google Scholar
Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DOF camera relocalization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2938–2946 (2015)
Google Scholar
Kopf, J., et al.: Deep photo: model-based photograph enhancement and viewing. In: Transactions on Graphics (Proceedings of SIGGRAPH Asia), vol. 27, no. 6, article no. 116 (2008)
Google Scholar
Kwon, Y.P., Kim, H., Konjevod, G., McMains, S.: Dude (duality descriptor): a robust descriptor for disparate images using line segment duality. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 310–314, September 2016. https://doi.org/10.1109/ICIP.2016.7532369
Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: an accurate O(n) solution to the PnP problem. Int. J. Comput. Vision (2009). https://doi.org/10.1007/s11263-008-0152-6
Article Google Scholar
Lowe, D.G., et al.: Object recognition from local scale-invariant features. In: ICCV, vol. 99, pp. 1150–1157 (1999)
Google Scholar
Mishchuk, A., Mishkin, D., Radenović, F., Matas, J.: Working hard to know your neighbor’s margins: local descriptor learning loss. In: Advances in Neural Information Processing Systems, NIPS 2017, vol. 2017-Decem, pp. 4827–4838. Curran Associates Inc., Red Hook (2017)
Google Scholar
Nagy, B.: A new method of improving the azimuth in mountainous terrain by skyline matching. PFG – J. Photogrammetry Remote Sens. Geoinform. Sci. 88(2), 121–131 (2020). https://doi.org/10.1007/s41064-020-00093-1
Article Google Scholar
Nistér, D.: An efficient solution to the five-point relative pose problem. IEEE Trans. Pattern Anal. Mach. Intell. 26(6), 0756–777 (2004)
Article Google Scholar
Rocco, I., Cimpoi, M., Arandjelović, R., Torii, A., Pajdla, T., Sivic, J.: Neighbourhood consensus networks. In: Advances in Neural Information Processing Systems, vol. 2018-Decem, pp. 1651–1662 (2018)
Google Scholar
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: Proceedings of the IEEE International Conference on Computer Vision (2011). https://doi.org/10.1109/ICCV.2011.6126544
Sattler, T., Zhou, Q., Pollefeys, M., Leal-Taixe, L.: Understanding the limitations of CNN-based absolute camera pose regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3302–3312 (2019)
Google Scholar
Kim, S., Min, D., Ham, B., Ryu, S., Do, M.N., Sohn, K.: DASC: dense adaptive self-correlation descriptor for multi-modal and multi-spectral correspondence. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2103–2112, June 2015. https://doi.org/10.1109/CVPR.2015.7298822
Shechtman, E., Irani, M.: Matching local self-similarities across images and videos. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, June 2007. https://doi.org/10.1109/CVPR.2007.383198
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings (2015)
Google Scholar
Tian, Y., Fan, B., Wu, F.: L2-Net: deep learning of discriminative patch descriptor in euclidean space. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6128–6136, July 2017. https://doi.org/10.1109/CVPR.2017.649
Viola, P., Wells, W.M.: Alignment by maximization of mutual information. Int. J. Comput. Vision 24(2), 137–154 (1997). https://doi.org/10.1023/A:1007958904918
Article Google Scholar
Wang, C.P., Wilson, K., Snavely, N.: Accurate georegistration of point clouds using geographic data. In: 2013 International Conference on 3DTV-Conference, pp. 33–40 (2013). https://doi.org/10.1109/3DV.2013.13
Weyand, T., Kostrikov, I., Philbin, J.: PlaNet - photo geolocation with convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 37–55. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_3
Chapter Google Scholar
Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: LIFT: learned invariant feature transform. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 467–483. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_28
Chapter Google Scholar

Download references

Acknowledgement

This work was supported by project no. LTAIZ19004 Deep-Learning Approach to Topographical Image Analysis; by the Ministry of Education, Youth and Sports of the Czech Republic within the activity INTER-EXCELENCE (LT), subactivity INTER-ACTION (LTA), ID: SMSM2019LTAIZ. Computational resources were partly supplied by the project e-Infrastruktura CZ (e-INFRA LM2018140) provided within the program Projects of Large Research, Development and Innovations Infrastructures. Satellite Imagery: Data provided by the European Space Agency.

Author information

Authors and Affiliations

Faculty of Information Technology, CPhoto@FIT, Brno University of Technology, Božetěchova 2, 61200, Brno, Czech Republic
Jan Brejcha & Martin Čadík
Adobe Inc., 345 Park Ave, San Jose, CA, 95110-2704, USA
Jan Brejcha, Michal Lukáč, Yannick Hold-Geoffroy & Oliver Wang

Authors

Jan Brejcha
View author publications
You can also search for this author in PubMed Google Scholar
Michal Lukáč
View author publications
You can also search for this author in PubMed Google Scholar
Yannick Hold-Geoffroy
View author publications
You can also search for this author in PubMed Google Scholar
Oliver Wang
View author publications
You can also search for this author in PubMed Google Scholar
Martin Čadík
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Brejcha .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 2 (mp4 62367 KB)

Supplementary material 1 (pdf 11723 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brejcha, J., Lukáč, M., Hold-Geoffroy, Y., Wang, O., Čadík, M. (2020). LandscapeAR: Large Scale Outdoor Augmented Reality by Matching Photographs with Terrain Models Using Learned Descriptors. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12374. Springer, Cham. https://doi.org/10.1007/978-3-030-58526-6_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-58526-6_18
Published: 07 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58525-9
Online ISBN: 978-3-030-58526-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics