Skip to main content
Log in

UP-Net: unique keyPoint description and detection net

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Many computer vision tasks, such as simultaneous localization and mapping, visual localization, image retrieval, pose estimation, structure-from-motion, rely on the keypoints matching relationship between image pairs. Recently, jointly learned keypoint descriptor and detector nets with simple structure have shown highly competitive performance. However, most of them have two limitations: 1) The positioning accuracy of detected keypoints is poor, which has a negative impact on many applications; 2) by only emphasizing repeatability in keypoint detection, mismatches may occur easily in the texture areas. In this work, we make two enhancements on D2-Net to address these two problems: Firstly, feature fusion is used to enrich feature information of different levels and solve the problem of positioning accuracy of keypoints; secondly, the uniqueness index of keypoints is added in the keypoint detection, and keypoints of the repeated pattern in the texture region are eliminated, which makes keypoints more effective and accurate. Furthermore, we use homography to build the correspondence between image pairs and use it to achieve unsupervised training. Our method achieves leading performance on the HPatches dataset for image matching, especially on its illumination sequences, with a 5\(\%\) improvement over the state-of-the-art ASLFeat method at a projection error threshold of 10 px. Meanwhile, our keypoint positioning accuracy is twice than that of D2-Net with the strict projection error threshold. It also exhibits competitive performance in 3D reconstruction and visual localization experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Sattler, T., Torii, A., Sivic, J., Pollefeys, M., Taira, H., Okutomi, M., Pajdla, T.: Are Large-Scale 3D Models Really Necessary for Accurate Visual Localization? In: IEEE Conference on Computer Vision & Pattern Recognition (2017)

  2. Svarm, L., Enqvist, O., Kahl, F., Oskarsson, M.: City-Scale Localization for Cameras with Known Vertical Direction. In:IEEE Transactions on Pattern Analysis & Machine Intelligence, 1455-1461 (2016)

  3. Yu, H.W., Moon, J.Y., Lee, B.H.: A Variational Observation Model of 3D Object for Probabilistic Semantic SLAM. In: 2019 International Conference on Robotics and Automation (ICRA) (2019)

  4. Sattler, T., Weyand, T., Leibe, B., Kobbelt, L.: Image Retrieval for Image-Based Localization Revisited. In:British Machine Vision Conference 2012(2012)

  5. Kundu, J.N., V., R.M., Ganeshan, A., Babu, R.V.: Object Pose Estimation from Monocular Image using Multi-View Keypoint Correspondence. In: European Conference on Computer Vision Workshops (2018)

  6. Schonberger, J.L., Frahm, J.M.: Structure-from-Motion Revisited. In: IEEE Conference on Computer Vision & Pattern Recognition (2016)

  7. Heinly, J., Schönberger, J.L., Dunn, E., Frahm, J.M.: Reconstructing the World* in Six Days *(As Captured by the Yahoo 100 Million Image Dataset). In: Computer Vision and Pattern Recognition (CVPR) (2015)

  8. Lowe, D.G.: Distinctive Image Features from Scale-Invariant Keypoints. In:International Journal of Computer Vision 60(2), 91-110 (2004)

  9. Bay, H., Tuytelaars, T., Gool, L.V.: SURF: Speeded up robust features. In: Computer Vision & Image Understanding 110(3):404-417(2006)

  10. Harris, C.G., Stephens, M.J.: A combined corner and edge detector. In: Alvey vision conference (1988)

  11. Mikolajczyk, K., Schmid, C.: An affine invariant interest point detector. In: European Conference on Computer Vision (2002)

  12. Leutenegger, S., Chli, M., Siegwart, R.Y.: BRISK: Binary Robust invariant scalable keypoints. In: International Conference on Computer Vision (2011)

  13. Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: An efficient alternative to SIFT or SURF. In: International Conference on Computer Vision (2012)

  14. Detone, D., Malisiewicz, T., Rabinovich, A.: SuperPoint: Self-Supervised Interest Point Detection and Description. In:Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2017)

  15. Revaud, J., Weinzaepfel, P., De Souza, C., Pion, N., Csurka, G., Cabon, Y., Humenberger, M.: R2D2: Repeatable and Reliable Detector and Descriptor. In:NeurIPs (2019)

  16. Dusmanu, M., Rocco, I., Pajdla, T., Pollefeys, M., Sivic, J., Torii, A., Sattler, T.: D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. In:2019 IEEE/CVFConference on Computer Vision and Pattern Recognition (CVPR) (2019)

  17. Luo, Z., Zhou, L., Bai, X., Chen, H., Quan, L.: ASLFeat: Learning Local Features of Accurate Shape and Localization. In:2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

  18. Balntas, V., Lenc, K., Vedaldi, A., Mikolajczyk, K.: HPatches: A Benchmark and Evaluation of Handcrafted and Learned Local Descriptors. In: IEEE Conference on Computer Vision & Pattern Recognition (2017)

  19. Rosten, E., Drummond, T.: Fusing points and lines for high performance tracking. In: Tenth IEEE International Conference on Computer Vision (2005)

  20. Tutsoy, O.; Barkana, D.E.: Model free adaptive control of the under-actuated robot manipulator with the chaotic dynamics. In: ISA Transactions (2021)

  21. Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: LIFT: Learned Invariant Feature Transform. In: European Conference on Computer Vision (2016)

  22. Ono, Y., Trulls, E., Fua, P., Yi, K.M.: LF-Net: Learning Local Features from Images. In:NIPS (2018)

  23. Tian, Y., Fan, B., Wu, F.: L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

  24. Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable Convolutional Networks. In:2017 IEEE International Conference on Computer Vision (ICCV) (2017)

  25. Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable ConvNets V2: More Deformable, Better Results. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

  26. Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image Recognition. In:International Conference for Learning Representations (2014)

  27. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Zitnick, C.L.: Microsoft COCO: Common Objects in Context. In: European Conference on Computer Vision (2014)

  28. Diederik P. Kingma and Jimmy Ba.: Adam: A Method for Stochastic Optimization. In: Proc. ICLR (2015)

  29. Deng, J., Dong, W., Socher, R., Li, L.J., Li, F.F.: ImageNet: A large-scale hierarchical image database. In: IEEE Conference on Computer Vision & Pattern Recognition (2009)

  30. Mishkin, D., Radenovic, F., Matas, J.: Repeatability Is Not Enough: Learning Affine Regions via Discriminability. In:Computer Vision and Pattern Recognition (2017)

  31. Mishchuk, A., Mishkin, D., Radenovic, F., Matas, J.: Working hard to know your neighbor’s margins:Local descriptor learning loss. In:Computer Vision and Pattern Recognition (2017)

  32. Perdoch, M., Chum, O., Matas, J.: Efficient representation of local geometry for large scale object retrieval. In: IEEE Conference on Computer Vision & Pattern Recognition (2009)

  33. Noh, H., Araujo, A., Sim, J., Weyand, T., Han, B.: Large-Scale Image Retrieval with Attentive Deep Local Features. In: 2017 IEEE International Conference on Computer Vision (ICCV) (2017)

  34. Schonberger, J.L., Hardmeier, H., Sattler, T., Pollefeys, M.: Comparative Evaluation of Hand-Crafted and Learned Local Features. In: IEEE Conference on Computer Vision & Pattern Recognition (2017)

  35. Wilson, K., Snavely, N.: Robust Global Translations with 1DSfM. In:European Conference on Computer Vision Springer, Cham (2014)

  36. Schönberger, J.L., Zheng, E., Frahm, J.M., Pollefeys, M.: Pixelwise View Selection for Unstructured Multi-View Stereo. In:European Conference on Computer Vision (ECCV) (2016)

  37. Luo, Z., Shen, T., Zhou, L., Zhu, S., Zhang, R., Yao, Y., Fang, T., Quan, L.: GeoDesc: Learning Local Descriptors by Integrating Geometry Constraints. In: European Conference on Computer Vision (2018)

  38. Sattler, T., Maddern, W., Toft, C., Torii, A., Hammarstrand, L., Stenborg, E., Safari, D., Okutomi, M., Pollefeys, M., Sivic, J.: Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

  39. Luo, Z., Shen, T., Zhou, L., Zhang, J., Yao, Y., Li, S., Fang, T., Quan, L.: ContextDesc: Local Descriptor Augmentation with Cross-Modality Context. In:2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Fang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, N., Han, Y., Fang, J. et al. UP-Net: unique keyPoint description and detection net. Machine Vision and Applications 33, 13 (2022). https://doi.org/10.1007/s00138-021-01266-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-021-01266-7

Keywords

Navigation