Fusion representation learning for keypoint detection and description

Sun, Shantong; Park, Unsang; Sun, Shuqiao; Liu, Rongke

doi:10.1007/s00371-022-02689-7

Fusion representation learning for keypoint detection and description

Original article
Published: 13 October 2022

Volume 39, pages 5683–5692, (2023)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Shantong Sun ORCID: orcid.org/0000-0001-8621-7072^1,2,
Unsang Park³,
Shuqiao Sun² &
…
Rongke Liu^2,4

327 Accesses
1 Citation
2 Altmetric
Explore all metrics

Abstract

Keypoint detection and description are the basis of many computer vision applications such as object recognition and image analysis. Current deep learning-based methods have made great progress in joint learning of keypoint detection and description construction. Low-level features have been proved to be helpful for keypoint detection and description. However, current detector and descriptor focus more on high-level feature and ignore the importance of low-level feature. They simply concatenate features and are lack of sufficient feature fusion. In this work, we propose a fusion representation learning network, which fuses different levels of features for both detectors and descriptors. Furthermore, we design and propose an adaptive feature fusion structure for the descriptor. Extensive experiments on HPatches, FM-Bench and Day-Night datasets demonstrate the superiority of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MLIFeat: Multi-level Information Fusion Based Deep Local Features

Deep Corner

Article Open access 05 July 2023

UP-Net: unique keyPoint description and detection net

Article 23 December 2021

Data availability

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

References

Nai, K., Li, Z., Li, G., Wang, S.: Robust object tracking via local sparse appearance model. IEEE Tran. Image Process. 27(10), 4958–4970 (2018)
Article MathSciNet MATH Google Scholar
Sipiran, I., Bustos, B.: Key-components: detection of salient regions on 3D meshes. Vis. Comput. 29(12), 1319–1332 (2013)
Article Google Scholar
Zhou, L., Zhu, S., Luo, Z., Shen, T., Zhang, R., Zhen, M., Fang, T., Quan, L.: Learning and matching multi-view descriptors for registration of point clouds. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 505–522 (2018)
Harris, C., Stephens, M.: A combined corner and edge detector. Alvey Vis. Conf. 50(15), 10–5244 (1988)
Google Scholar
Mikolajczyk, K., Schmid, C.: Scale & affine invariant interest point detectors. Int. J. Comput. Vis. 60(1), 63–86 (2004)
Article Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: International Conference on Computer Vision (ICCV), vol. 2011, pp. 2564–2571 (2011)
DeTone, D., Malisiewicz, T., Rabinovich, A.: Superpoint: Self-supervised interest point detection and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 224–236 (2018)
Ono, Y., Trulls, E., Fua, P., Yi, K.M.: LF-Net: learning local features from images, arXiv preprint arXiv:1805.09662 (2018)
Jegou, H., Douze, M., Schmid, C., Perez, P.: Aggregating local descriptors into a compact image representation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2010, pp. 3304–3311 (2010)
Dusmanu, M., Rocco, T., Pajdla, T., Pollefeys, M., Sivic, J., Torii, A., Sattler, T.: D2-net: a trainable CNN for joint description and detection of local features. In: Proceedings of the IEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8092–8101 (2019)
Revaud, J., Weinzaepfel, P., De Souza, C., Pion, N., Csurka, G., Cabon, Y., Humenberger, M.: R2D2: repeatable and reliable detector and descriptor, arXiv preprint arXiv:1906.06195 (2019)
Luo, Z., Zhou, L., Bai, X., Chen, H., Zhang, J., Yao, Y., Li, S., Fang, T., Quan, L.: Aslfeat: learning local features of accurate shape and localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6589–6598 (2020)
Zhang, W., Xiong, Q., Shi, W., Chen, S.: Region saliency detection via multi-feature on absorbing Markov chain. Vis. Comput. 32(3), 275–287 (2016)
Article Google Scholar
Harris, C., Stephens, M.: A combined corner and edge detector. In: Alvey Vision Conference, vol. 50, No. 15, pp. 10–5244 (1988)
Kong, H., Akakin, H.C., Sarma, S.E.: A generalized Laplacian of Gaussian filter for blob detection and its applications. IEEE Trans. Cybern. 43(6), 1719–1733 (2013)
Article Google Scholar
Zhang, X., Yu, F.X., Karaman, S., Chang, S.F.: Learning discriminative and transformation covariant local feature detectors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6818–6826 (2017)
Yao, Q., Hu, X., Lei, H.: Geospatial object detection in remote sensing images based on multi-scale convolutional neural networks. In: IGARSS, IEEE International Geoscience and Remote Sensing Symposium, vol. 2019, pp. 1450–1453 (2019)
Bay, H., Tuytelaars, T., Van Gool, L., Surf, L.: Speeded up robust features. In: European Conference on Computer Vision (ECCV), pp. 404–417 (2006)
Liu, B., Wu, H., Su, W., Zhang, W., Sun, J.: Rotation-invariant object detection using Sector-ring HOG and boosted random ferns. Vis. Comput. 34(5), 707–719 (2018)
Article Google Scholar
Strecha, C., Bronstein, A., Bronstein, M., Fua, P.: LDAHash: improved matching with smaller descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 34(1), 66–78 (2011)
Article Google Scholar
Noh, H., Araujo, A., Sim, J., Weyand, T., Han, B.: Large-scale image retrieval with attentive deep local features. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 3456–3465 (2017)
Yi, K.M., Verdie, Y., Fua, P., Lepetit, V.: Learning to assign orientations to feature points. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 107–116 (2016)
Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: Lift: learned invariant feature transform. In: European Conference on Computer Vision (ECCV), pp. 467–483 (2016)
Shen, X., Wang, C., Li, X., Yu, Z., Li, J., Wen, C., Cheng, M., He, Z.: Rf-net: an end-to-end image matching network based on receptive field. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8132–8140 (2019)
Balntas, V., Lenc, K., Vedaldi, A., Mikolajczyk, K.: HPatches: a benchmark and evaluation of handcrafted and learned local descriptors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5173-5182 (2017)
Bian, J.W., Wu, Y.H., Cheng, M.M., Reid, I.: An evaluation of feature matchers for fundamental matrix estimation, arXiv preprint arXiv:1908.09474 (2019)
Sattler, T., Weyand, T., Leibe, B., Kobbelt, L.: Image retrieval for image-based localization revisited. In: British Machine Vision Conference (BMVC) vol. 2, No. 1, p. 4 (2012)
Sturm, J., Engelhard, N., Endres, F., Burgard, W., Cremers, D.: A benchmark for the evaluation of RGB-D SLAM systems. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), vol. 2012, pp. 573–580 (2012)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2012, pp. 3354–3361 (2012)
Knapitsch, A., Park, J., Zhou, Q.Y., Koltun, V.: Tanks and temples: benchmarking large-scale scene reconstruction. ACM Trans. Gr. (ToG) 36(4), 1–13 (2017)
Article Google Scholar
Wilson, K., Snavely, N.: Robust global translations with 1dsfm. In: European Conference on Computer Vision (ECCV), pp. 61–75 (2014)
Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4104–4113 (2016)
Mishchuk, A., Mishkin, D., Radenovic, F., Matas, J.: Working hard to know your neighbor’s margins: local descriptor learning loss. Adv. Neural Inf. Process. Syst. (NIPS), vol. 30 (2017)
Mishkin, D., Radenovic, F., Matas, J.: Repeatability is not enough: learning affine regions via discriminability. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 284–300 (2018)
Luo, Z., Shen, T., Zhou, L., Zhang, J., Yao, Y., Li, S., Fang, T., Quan, L.: Contextdesc: local descriptor augmentation with cross-modality context. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2527–2536 (2019)
Arandjelovic, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2012, pp. 2911–2918 (2012)
https://www.visuallocalization.net/
Joshi, K., Patel, M.I.: Recent advances in local feature detector and descriptor: a literature survey. Int. J. Multimed. Inf. Retr. 9(4), 231–247 (2020)
Article Google Scholar
Qin, Z., Fang, K., Zhu, Y., Fei-Fei, L., Savarese, S.: Keto: learning keypoint representations for tool manipulation. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 7278–7285 (2020)
Song, Y., Cai, L., Li, J., Tian, Y., Li, M.: SEKD: self-evolving keypoint detection and description, arXiv preprint arXiv:2006.05077 (2020)
Yang, Y., Asthana, A., Zheng, L.: Does keypoint estimation benefit object detection? An empirical study of one-stage and two-stage detectors. In: 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG ), pp. 1–7 (2021)

Download references

Acknowledgements

This work was supported in part by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education under the Grant NRF-2020R1F1A1072332, in part by National Natural Science Foundation of China under the Grant 61231010, in part by the scholarship from China Scholarship Council (CSC) under the Grant CSC No. 202006020119.

Author information

Authors and Affiliations

School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, 212013, China
Shantong Sun
School of Electronic and Information Engineering, Beihang University, Beijing, 100191, China
Shantong Sun, Shuqiao Sun & Rongke Liu
School of Computer Science and Engineering, Sogang University, Seoul, 04107, South Korea
Unsang Park
Shenzhen Institute of Beihang University, Beihang University, Shenzhen, 518063, China
Rongke Liu

Authors

Shantong Sun
View author publications
You can also search for this author in PubMed Google Scholar
Unsang Park
View author publications
You can also search for this author in PubMed Google Scholar
Shuqiao Sun
View author publications
You can also search for this author in PubMed Google Scholar
Rongke Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shantong Sun.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sun, S., Park, U., Sun, S. et al. Fusion representation learning for keypoint detection and description. Vis Comput 39, 5683–5692 (2023). https://doi.org/10.1007/s00371-022-02689-7

Download citation

Accepted: 24 September 2022
Published: 13 October 2022
Issue Date: November 2023
DOI: https://doi.org/10.1007/s00371-022-02689-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fusion representation learning for keypoint detection and description

Abstract

Access this article

Similar content being viewed by others

MLIFeat: Multi-level Information Fusion Based Deep Local Features

Deep Corner

UP-Net: unique keyPoint description and detection net

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fusion representation learning for keypoint detection and description

Abstract

Access this article

Similar content being viewed by others

MLIFeat: Multi-level Information Fusion Based Deep Local Features

Deep Corner

UP-Net: unique keyPoint description and detection net

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation