Abstract
We introduce a solution to large scale Augmented Reality for outdoor scenes by registering camera images to textured Digital Elevation Models (DEMs). To accommodate the inherent differences in appearance between real images and DEMs, we train a cross-domain feature descriptor using Structure From Motion (SFM) guided reconstructions to acquire training data. Our method runs efficiently on a mobile device and outperforms existing learned and hand-designed feature descriptors for this task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aguilera, C.A., Aguilera, F.J., Sappa, A.D., Toledo, R.: Learning cross-spectral similarity measures with deep convolutional neural networks. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 267–275 (2016). https://doi.org/10.1109/CVPRW.2016.40
Aguilera, C.A., Sappa, A.D., Aguilera, C., Toledo, R.: Cross-spectral local descriptors via quadruplet network. Sensors (Switzerland) 17(4), 1–14 (2017). https://doi.org/10.3390/s17040873
Arandjelović, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. Arxiv (2015). http://arxiv.org/abs/1511.07247
Baboud, L., Čadík, M., Eisemann, E., Seidel, H.P.: Automatic photo-to-terrain alignment for the annotation of mountain pictures. In: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011, pp. 41–48. IEEE Computer Society, Washington (2011). https://doi.org/10.1109/CVPR.2011.5995727
Baruch, E.B., Keller, Y.: Multimodal matching using a hybrid convolutional neural network. CoRR abs/1810.12941 (2018). http://arxiv.org/abs/1810.12941
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_32
Brejcha, J., Čadík, M.: GeoPose3K: mountain landscape dataset for camera pose estimation in outdoor environments. Image Vis. Comput. 66, 1–14 (2017). https://doi.org/10.1016/j.imavis.2017.05.009
Brejcha, J., Čadík, M.: Camera orientation estimation in natural scenes using semantic cues. In: 2018 International Conference on 3D Vision (3DV), pp. 208–217, September 2018. https://doi.org/10.1109/3DV.2018.00033
Brejcha, J., Lukáč, M., Chen, Z., DiVerdi, S., Čadík, M.: Immersive trip reports. In: Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology, UIST 2018, pp. 389–401. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3242587.3242653
Čadík, M., Sýkora, D., Lee, S.: Automated outdoor depth-map generation and alignment. Elsevier Comput. Graph. 74, 109–118 (2018)
Chen, J., Tian, J.: Real-time multi-modal rigid registration based on a novel symmetric-SIFT descriptor. Prog. Nat. Sci. 19(5), 643–651 (2009). https://doi.org/10.1016/j.pnsc.2008.06.029
Dusmanu, M., et al.: D2-Net: a trainable CNN for joint detection and description of local features. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019. http://arxiv.org/abs/1905.03561
En, S., Lechervy, A., Jurie, F.: TS-NET: Combining modality specific and common features for multimodal patch matching. In: Proceedings - International Conference on Image Processing, ICIP, pp. 3024–3028 (2018). https://doi.org/10.1109/ICIP.2018.8451804
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Georgakis, G., Karanam, S., Wu, Z., Ernst, J., Kosecka, J.: End-to-end learning of keypoint detector and descriptor for pose invariant 3D matching, February 2018. http://arxiv.org/abs/1802.07869
Georgakis, G., Karanam, S., Wu, Z., Kosecka, J.: Learning local RGB-to-CAD correspondences for object pose estimation. In: The IEEE International Conference on Computer Vision (ICCV), October 2019
Harwood, B., Vijay Kumar, B.G., Carneiro, G., Reid, I., Drummond, T.: Smart mining for deep metric learning. In: Proceedings of the IEEE International Conference on Computer Vision (2017). https://doi.org/10.1109/ICCV.2017.307
Hasan, M., Pickering, M.R., Jia, X.: Modified sift for multi-modal remote sensing image registration. In: 2012 IEEE International Geoscience and Remote Sensing Symposium, pp. 2348–2351, July 2012. https://doi.org/10.1109/IGARSS.2012.6351023
Irani, M., Anandan, P.: Robust multi-sensor image alignment. In: Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271), pp. 959–966, January 1998. https://doi.org/10.1109/ICCV.1998.710832
Keller, Y., Averbuch, A.: Multisensor image registration via implicit similarity. IEEE Trans. Pattern Anal. Mach. Intell. 28(5), 794–801 (2006). https://doi.org/10.1109/TPAMI.2006.100
Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DOF camera relocalization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2938–2946 (2015)
Kopf, J., et al.: Deep photo: model-based photograph enhancement and viewing. In: Transactions on Graphics (Proceedings of SIGGRAPH Asia), vol. 27, no. 6, article no. 116 (2008)
Kwon, Y.P., Kim, H., Konjevod, G., McMains, S.: Dude (duality descriptor): a robust descriptor for disparate images using line segment duality. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 310–314, September 2016. https://doi.org/10.1109/ICIP.2016.7532369
Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: an accurate O(n) solution to the PnP problem. Int. J. Comput. Vision (2009). https://doi.org/10.1007/s11263-008-0152-6
Lowe, D.G., et al.: Object recognition from local scale-invariant features. In: ICCV, vol. 99, pp. 1150–1157 (1999)
Mishchuk, A., Mishkin, D., Radenović, F., Matas, J.: Working hard to know your neighbor’s margins: local descriptor learning loss. In: Advances in Neural Information Processing Systems, NIPS 2017, vol. 2017-Decem, pp. 4827–4838. Curran Associates Inc., Red Hook (2017)
Nagy, B.: A new method of improving the azimuth in mountainous terrain by skyline matching. PFG – J. Photogrammetry Remote Sens. Geoinform. Sci. 88(2), 121–131 (2020). https://doi.org/10.1007/s41064-020-00093-1
Nistér, D.: An efficient solution to the five-point relative pose problem. IEEE Trans. Pattern Anal. Mach. Intell. 26(6), 0756–777 (2004)
Rocco, I., Cimpoi, M., Arandjelović, R., Torii, A., Pajdla, T., Sivic, J.: Neighbourhood consensus networks. In: Advances in Neural Information Processing Systems, vol. 2018-Decem, pp. 1651–1662 (2018)
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: Proceedings of the IEEE International Conference on Computer Vision (2011). https://doi.org/10.1109/ICCV.2011.6126544
Sattler, T., Zhou, Q., Pollefeys, M., Leal-Taixe, L.: Understanding the limitations of CNN-based absolute camera pose regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3302–3312 (2019)
Kim, S., Min, D., Ham, B., Ryu, S., Do, M.N., Sohn, K.: DASC: dense adaptive self-correlation descriptor for multi-modal and multi-spectral correspondence. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2103–2112, June 2015. https://doi.org/10.1109/CVPR.2015.7298822
Shechtman, E., Irani, M.: Matching local self-similarities across images and videos. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, June 2007. https://doi.org/10.1109/CVPR.2007.383198
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings (2015)
Tian, Y., Fan, B., Wu, F.: L2-Net: deep learning of discriminative patch descriptor in euclidean space. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6128–6136, July 2017. https://doi.org/10.1109/CVPR.2017.649
Viola, P., Wells, W.M.: Alignment by maximization of mutual information. Int. J. Comput. Vision 24(2), 137–154 (1997). https://doi.org/10.1023/A:1007958904918
Wang, C.P., Wilson, K., Snavely, N.: Accurate georegistration of point clouds using geographic data. In: 2013 International Conference on 3DTV-Conference, pp. 33–40 (2013). https://doi.org/10.1109/3DV.2013.13
Weyand, T., Kostrikov, I., Philbin, J.: PlaNet - photo geolocation with convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 37–55. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_3
Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: LIFT: learned invariant feature transform. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 467–483. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_28
Acknowledgement
This work was supported by project no. LTAIZ19004 Deep-Learning Approach to Topographical Image Analysis; by the Ministry of Education, Youth and Sports of the Czech Republic within the activity INTER-EXCELENCE (LT), subactivity INTER-ACTION (LTA), ID: SMSM2019LTAIZ. Computational resources were partly supplied by the project e-Infrastruktura CZ (e-INFRA LM2018140) provided within the program Projects of Large Research, Development and Innovations Infrastructures. Satellite Imagery: Data provided by the European Space Agency.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary material 2 (mp4 62367 KB)
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Brejcha, J., Lukáč, M., Hold-Geoffroy, Y., Wang, O., Čadík, M. (2020). LandscapeAR: Large Scale Outdoor Augmented Reality by Matching Photographs with Terrain Models Using Learned Descriptors. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12374. Springer, Cham. https://doi.org/10.1007/978-3-030-58526-6_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-58526-6_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58525-9
Online ISBN: 978-3-030-58526-6
eBook Packages: Computer ScienceComputer Science (R0)