Abstract
Convolutional neural networks (CNNs) have become a mainstream method for keypoint matching in addition to image recognition, object detection, and semantic segmentation. Learned Invariant Feature Transform (LIFT) is pioneering method based on CNN. It performs keypoint detection, orientation estimation, and feature description in a single network. Among these processes, the orientation estimation is needed to obtain invariance for rotation changes. However, unlike the feature point detector and feature descriptor, the orientation estimator has not been considered important for accurate keypoint matching or been well researched even after LIFT is proposed. In this paper, we propose a novel coarse-to-fine orientation estimator that improves matching accuracy. First, the coarse orientation estimator estimates orientations to make the rotation error as small as possible even if large rotation changes exist between an image pair. Second, the fine orientation estimator further improves matching accuracy with the orientation estimated by the coarse orientation estimator. By using the proposed two-stage CNNs, we can accurately estimate orientations improving matching performance. The experimental results with the HPatches benchmark show that our method can achieve a more accurate precision-recall curve than single CNN-based orientation estimators.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Arandjelović, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: Conference on Computer Vision and Pattern Recognition, pp. 2911–2918 (2012)
Bailey, T., Durrant-Whyte, H.: Simultaneous localization and mapping (SLAM). Part II. Robot. Autom. Mag. 13, 108–117 (2006)
Balntas, V., Lenc, K., Vedaldi, A., Mikolajczyk, K.: HPatches: a benchmark and evaluation of handcrafted and learned local descriptors. In: Computer Vision and Pattern Recognition, pp. 5173–5182 (2017)
Balntas, V., Riba, E., Ponsa, D., Mikolajczyk, K.: Learning local feature descriptors with triplets and shallow convolutional neural networks. In: British Machine Vision Conference, p. 119 (2016)
Bay, H., Tuytelaars, T., Gool, L.V.: SURF: Speeded-up robust features. Comput. Vis. Image Underst. 110(3), 346–359 (2008)
Brown, M., Lowe, D.G.: Automatic panoramic image stitching using invariant features. Int. J. Comput. Vis. 74(1), 59–73 (2007)
Durrant-Whyte, H., Bailey, T.: Simultaneous localization and mapping: part I. Robot. Autom. Mag. 13(2), 99–110 (2006)
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: Neural Information Processing Systems, pp. 2017–2025 (2015)
Kingma, D.P., Ba, J.L.: Adam : a method for stochasic optimization. In: International Conference on Learning Representation (2015)
Lowe, D.G.: Object recognition from local scale-invariant features. In: International Conference on Computer Vision (1999)
Calonder, M., Lepetit, V., Strecha, C., Fua, P.: BRIEF: binary robust independent elementary features. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 778–792. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_56
Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans. Robot. 31(5), 1147–1163 (2015)
Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. Comput. Vis. Pattern Recogn. 2, 2161–2168 (2006)
Ono, Y., Trulls, E., Fua, P., Yi, K.M.: LF-Net: learning local features from images. In: Neural Information Processing Systems (2018)
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: International Conference on Computer Vision, pp. 2564–2571 (2011)
Simo-Serra, E., Trulls, E., Ferraz, L., Kokkinos, I., Fua, P., Moreno-Noguer, F.: Discriminative learning of deep convolutional feature point descriptors. In: International Conference on Computer Vision, pp. 118–126 (2015)
Taylor, S., Drummond, T.: Binary histogrammed intensity patches for effcient and robust matching. Int. J. Comput. Vis. 94, 241–265 (2011)
Trzcinski, T., Christoudias, M., Lepetit, V.: Learning image descriptors with boosting. Pattern Anal. Mach. Intell. 37, 597–610 (2015)
Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: LIFT: learned invariant feature transform. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part VI. LNCS, vol. 9910, pp. 467–483. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_28
Yi, K.M., Verdie, Y., Fua, P., Lepetit, V.: Learning to assign orientations to feature points. In: Computer Vision and Pattern Recognition, pp. 107–116 (2016)
Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: Computer Vision and Pattern Recognition, pp. 359–366 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Mori, Y., Hirakawa, T., Yamashita, T., Fujiyoshi, H. (2020). Coarse-to-Fine Deep Orientation Estimator for Local Image Matching. In: Palaiahnakote, S., Sanniti di Baja, G., Wang, L., Yan, W. (eds) Pattern Recognition. ACPR 2019. Lecture Notes in Computer Science(), vol 12046. Springer, Cham. https://doi.org/10.1007/978-3-030-41404-7_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-41404-7_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41403-0
Online ISBN: 978-3-030-41404-7
eBook Packages: Computer ScienceComputer Science (R0)