Skip to main content
Log in

Repeatable adaptive keypoint detection via self-supervised learning

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Keypoint-based matching is a fundamental technology for different computer vision tasks, in which keypoint detection is a crucial step and directly affects the entire performance. Based on deep learning approaches, the learning-based keypoint detectors have been significantly developed. To further improve the accuracy of high-level matching tasks, the extracted keypoints should provide more accurate point-to-point correspondences and maintain a uniform spatial distribution. Based on this idea, a self-supervised learning method of keypoint detection named repeatable adaptive point is proposed. This method consists of a self-supervised objective and an optimization algorithm. The objective maximizes the repeatability measure with the sparsity constraint of keypoints. This sparsity constraint is formulated by combining the non-maximum suppression operation and the penalty function of keypoint number, which generally makes keypoints have a uniform spatial distribution. A novel approximate alternate optimization algorithm is proposed to maximize the above objective, whose convergence is proved in theory. The proposed detector is “adaptive” because the combinations of it and some existing descriptors can adapt to high-level matching tasks with a fast convergence speed. Specifically, the combinations of it and SuperPoint/HardNet descriptors achieve state-of-the-art accuracy on three high-level tasks based on image matching, namely homography estimation, camera pose estimation, and three-dimensional reconstruction. Furthermore, the proposed method converges faster on new scenes compared with the state-of-the-art method that jointly optimizes the detector and the descriptor.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Lowe D G. Distinctive image features from scale-invariant keypoints. Int J Comput Vision, 2004, 60: 91–110

    Article  Google Scholar 

  2. Zhao C H, Fan B, Hu J W, et al. Homography-based camera pose estimation with known gravity direction for UAV navigation. Sci China Inf Sci, 2021, 64: 112204

    Article  Google Scholar 

  3. Chen M T, Wang X G, Luo H, et al. Learning to focus: cascaded feature matching network for few-shot image recognition. Sci China Inf Sci, 2021, 64: 192105

    Article  Google Scholar 

  4. Dong Q L, Shu M, Cui H N, et al. Learning stratified 3D reconstruction. Sci China Inf Sci, 2018, 61: 023101

    Article  MathSciNet  Google Scholar 

  5. Rosten E, Drummond T. Machine learning for high-speed corner detection. In: Proceedings of European Conference on Computer Vision, 2006. 430–443

  6. Strecha C, Lindner A, Ali K, et al. Training for task specific keypoint detection. In: Proceedings of Joint Pattern Recognition Symposium, 2009. 151–160

  7. Verdie Y, Yi K, Fua P, et al. TILDE: a temporally invariant learned detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015. 5279–5288

  8. Yi K M, Trulls E, Lepetit V, et al. LIFT: learned invariant feature transform. In: Proceedings of European Conference on Computer Vision, 2016. 467–483

  9. DeTone D, Malisiewicz T, Rabinovich A. SuperPoint: self-supervised interest point detection and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018. 224–236

  10. Laguna A B, Riba E, Ponsa D, et al. Key.Net: keypoint detection by handcrafted and learned CNN filters. In: Proceedings of International Conference on Computer Vision. 2019. 5835–5843

  11. Ono Y, Trulls E, Fua P, et al. LF-Net: learning local features from images. In: Proceedings of Advances in Neural Information Processing Systems, 2018. 6237–6247

  12. Revaud J, de Souza C R, Humenberger M, et al. R2D2: reliable and repeatable detector and descriptor. In: Proceedings of Advances in Neural Information Processing Systems, 2019. 12405–12415

  13. Schönberger J L, Frahm J M. Structure-from-motion revisited. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), 2016

  14. Kim S, Jeong M, Ko B C. Self-supervised keypoint detection based on multi-layer random forest regressor. IEEE Access, 2021, 9: 40850–40859

    Article  Google Scholar 

  15. Yan P, Tan Y, Tai Y, et al. Unsupervised learning framework for interest point detection and description via properties optimization. Pattern Recogn, 2021, 112: 107808

    Article  Google Scholar 

  16. Bay H, Tuytelaars T, van Gool L. SURF: speeded up robust features. In: Proceedings of European Conference on Computer Vision, 2006. 404–417

  17. Alcantarilla P F, Bartoli A, Davison A J. KAZE features. In: Proceedings of European Conference on Computer Vision, 2012. 214–227

  18. Noh H, Araujo A, Sim J, et al. Large-scale image retrieval with attentive deep local features. In: Proceedings of the IEEE International Conference on Computer Vision, 2017. 3456–3465

  19. Dusmanu M, Rocco I, Pajdla T, et al. D2-Net: a trainable CNN for joint description and detection of local features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 8092–8101

  20. Savinov N, Seki A, Ladicky L, et al. Quad-networks: unsupervised learning to rank for interest point detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 1822–1830

  21. Cieslewski T, Derpanis K G, Scaramuzza D. SIPs: succinct interest points from unsupervised inlierness probability learning. In: Proceedings of International Conference on 3D Vision, 2019. 604–613

  22. Mishkin D, Radenović F, Matas J. Repeatability is not enough: learning affine regions via discriminability. In: Proceedings of European Conference on Computer Vision, 2018

  23. Jing L, Tian Y. Self-supervised visual feature learning with deep neural networks: a survey. IEEE Trans Pattern Anal Mach Intell, 2021, 43: 4037–4058

    Article  Google Scholar 

  24. Zhang R, Isola P, Efros A A. Colorful image colorization. In: Proceedings of European Conference on Computer Vision, 2016. 649–666

  25. Ledig C, Theis L, Huszar F, et al. Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of Computer Vision and Pattern Recognition, 2017. 105–114

  26. Pathak D, Krähenbühl P, Donahue J, et al. Context encoders: feature learning by inpainting. In: Proceedings of Conference on Computer Vision and Pattern Recognition, 2016. 2536–2544

  27. Chen T, Kornblith S, Norouzi M, et al. A simple framework for contrastive learning of visual representations. In: Proceedings of the 37th International Conference on Machine Learning, 2020. 119: 1597–1607

  28. Lin T Y, Maire M, Belongie S, et al. Microsoft COCO: common objects in context. In: Proceedings of European Conference on Computer Vision, 2014. 740–755

  29. Balntas V, Lenc K, Vedaldi A, et al. Hpatches: a benchmark and evaluation of handcrafted and learned local descriptors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 5173–5182

  30. Menze M, Geiger A. Object scene flow for autonomous vehicles. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), 2015

  31. Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), 2012

  32. Rublee E, Rabaud V, Konolige K, et al. ORB: an efficient alternative to SIFT or SURF. In: Proceedings of International Conference on Computer Vision, 2011. 2564–2571

  33. Fischler M A, Bolles R C. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM, 1981, 24: 381–395

    Article  MathSciNet  Google Scholar 

  34. Chum O, Matas J. Matching with PROSAC—progressive sample consensus. In: Proceedings of Conference on Computer Vision and Pattern Recognition, 2005. 220–226

  35. He K, Sun J. Convolutional neural networks at constrained time cost. In: Proceedings of Conference on Computer Vision and Pattern Recognition CVPR, 2015. 5353–5360

  36. Schünberger J L, Hardmeier H, Sattler T, et al. Comparative evaluation of hand-crafted and learned local features. In: Proceedings of Conference on Computer Vision and Pattern Recognition, 2017. 6959–6968

Download references

Acknowledgements

This work was partially supported by National Natural Science Foundation of China (Grant No. 41371339), National R&D Program for Major Research Instruments of Natural Science Foundation of China (Grant No. 62027808), and Fundamental Research Funds for the Central Universities (Grant No. 2017KFYXJJ179).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yihua Tan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yan, P., Tan, Y. & Tai, Y. Repeatable adaptive keypoint detection via self-supervised learning. Sci. China Inf. Sci. 65, 212103 (2022). https://doi.org/10.1007/s11432-021-3364-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-021-3364-5

Keywords

Navigation