DeFusion: Aerial Image Matching Based on Fusion of Handcrafted and Deep Features

Song, Xianfeng; Zou, Yi; Shi, Zheng; Yang, Yanfeng; Li, Dacheng

doi:10.1007/978-981-99-8181-6_25

Xianfeng Song¹⁰,
Yi Zou¹⁰,
Zheng Shi¹⁰,
Yanfeng Yang¹⁰ &
…
Dacheng Li¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1968))

Included in the following conference series:

International Conference on Neural Information Processing

439 Accesses

Abstract

Machine vision has become a crucial method for drones to perceive their surroundings, and image matching, as a fundamental task in machine vision, has also gained widespread attention. However, due to the complexity of aerial images, traditional matching methods based on handcrafted features lack the ability to extract high-level semantics and unavoidably suffer from low robustness. Although deep learning has potential to improve matching accuracy, it comes with the high cost of requiring specific samples and computing resources, making it infeasible for many scenarios. To fully leverage the strengths of both approaches, we introduce DeFusion, a novel image matching scheme with a fine-grained decision-level fusion algorithm that effectively combines handcrafted and deep features. We train generic features on public datasets, enabling us to handle unseen scenarios. We use RootSIFT as prior knowledge to guide the extraction of deep features, significantly reducing computational overhead. We also carefully design preprocessing steps by incorporating drone attitude information. Eventually, as evidenced by our experimental results, the proposed scheme achieves an overall 2.5–6x more correct matches with improved robustness when compared to existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Our code is publicly available on Github: https://github.com/songxf1024/DeFusion.

References

Sharma, M., Singh, H., Singh, S., Gupta, A., Goyal, S., Kakkar, R.: A novel approach of object detection using point feature matching technique for colored images. In: Singh, P.K., Kar, A.K., Singh, Y., Kolekar, M.H., Tanwar, S. (eds.) Proceedings of ICRIC 2019. LNEE, vol. 597, pp. 561–576. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-29407-6_40
Chapter Google Scholar
Rashid, M., Khan, M.A., Sharif, M., Raza, M., Sarfraz, M.M., Afza, F.: Object detection and classification: a joint selection and fusion strategy of deep convolutional neural network and sift point features. Multimedia Tools Appl. 78(12), 15751–15777 (2019)
Google Scholar
Jiayi, M., Huabing, Z., Ji, Z., Yuan, G., Junjun, J., Jinwen, T.: Robust feature matching for remote sensing image registration via locally linear transforming. IEEE Trans. Geosci. Remote Sens. 53(12), 6469–6481 (2015)
Google Scholar
Ravi, C., Gowda, R.M.: Development of image stitching using feature detection and feature matching techniques. In: 2020 IEEE International Conference for Innovation in Technology (INOCON), pp. 1–7. IEEE (2020)
Google Scholar
Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: YOLOv4: Optimal speed and accuracy of object detection. CoRR, abs/2004.10934 (2020)
Google Scholar
O’Mahony, N., et al.: Deep learning vs. traditional computer vision. In: Science and information conference, pp. 128–144. Springer (2019)
Google Scholar
Tian, Y., Laguna, A.B., Ng, T., Balntas, V., Mikolajczyk, K.: HyNet: learning local descriptor with hybrid similarity measure and triplet loss. Adv. Neural Inf. Process. Syst. 33, 7401–7412 (2020)
Google Scholar
Arandjelović, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2911–2918. IEEE (2012)
Google Scholar
Pérez-Lorenzo, J., Vázquez-Martín, R., Marfil, R., Bandera, A., Sandoval, F.: Image Matching Based on Curvilinear Regions. na (2007)
Google Scholar
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157. IEEE (1999)
Google Scholar
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_32
Chapter Google Scholar
Calonder, M., Lepetit, V., Strecha, C., Brief, F.P.: Binary robust independent elementary features. In: Proceedings of the European Conference on Computer Vision, pp. 778–792
Google Scholar
Rublee, E., Rabaud, V., Konolige, K., Orb, G.B.: An efficient alternative to sift or surf. In: Proceedings of International Conference on Computer Vision, pp. 2564–2571
Google Scholar
Alcantarilla, P.F., Bartoli, A., Davison, A.J.: KAZE features. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 214–227. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33783-3_16
Chapter Google Scholar
Efe, U., Ince, K.G., Alatan, A.A.: Effect of parameter optimization on classical and learning-based image matching methods. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2506–2513 (2021)
Google Scholar
Verdie, Y., Yi, K., Fua, P., Lepetit, V.: TILDE: a temporally invariant learned detector. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5279–5288 (2015)
Google Scholar
Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: LIFT: learned invariant feature transform. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 467–483. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_28
Chapter Google Scholar
Tian, Y., Fan, B., Wu, F.: L2-Net: deep learning of discriminative patch descriptor in Euclidean space. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 661–669 (2017)
Google Scholar
Mishchuk, A., Mishkin, D., Radenovic, F., Matas, J.: Working hard to know your neighbor’s margins: local descriptor learning loss. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Luo, Z., et al.: GeoDesc: learning local descriptors by integrating geometry constraints. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 168–183 (2018)
Google Scholar
Tian, Y., Yu, X., Fan, B., Wu, F., Heijnen, H., Balntas, V.: SOSNet: second order similarity regularization for local descriptor learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11016–11025 (2019)
Google Scholar
Liang, Z., Yi, Y., Qi, T.: SIFT Meets CNN: a decade survey of instance retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 40(5), 1224–1244 (2017)
Google Scholar
Barroso-Laguna, A., Riba, E., Ponsa, D., Mikolajczyk, K.: Key. net: keypoint detection by handcrafted and learned CNN filters. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5836–5844 (2019)
Google Scholar
Tianyu, Z., Zhenjiang, M., Jianhu, Z.: Combining CNN with hand-crafted features for image classification. In: 2018 14th IEEE International Conference on Signal Processing (ICSP), pp. 554–557. IEEE (2018)
Google Scholar
Rodríguez, M., Facciolo, G., von Gioi, R.G., Musé, P., Morel, J.-M., Delon, J.: SIFT-AID: boosting sift with an affine invariant descriptor based on convolutional neural networks. In 2019 IEEE International Conference on Image Processing (ICIP), pp. 4225–4229. IEEE (2019)
Google Scholar
Song, Y., Zhengyu, X., Xinwei, W., Yingquan, Z.: MS-YOLO: object detection based on yolov5 optimized fusion millimeter-wave radar and machine vision. IEEE Sens. J. 22(15), 15435–15447 (2022)
Google Scholar
Yu, G., Jean-Michel, M.: ASIFT: an algorithm for fully affine invariant comparison. Image Process. Line 1, 11–38 (2011)
Google Scholar
Morel, J.-M., Guoshen, Yu.: ASIFT: a new framework for fully affine invariant image comparison. SIAM J. Img. Sci. 2(2), 438–469 (2009)
Article MathSciNet MATH Google Scholar
Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13713–13722 (2021)
Google Scholar
Zhou, D., Hou, Q., Chen, Y., Feng, J., Yan, S.: Rethinking bottleneck structure for efficient mobile network design. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 680–697. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_40
Chapter Google Scholar
Winder, S.A.J., Brown, M.: Learning local image descriptors. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2007)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Google Scholar
Balntas, V., Riba, E., Ponsa, D., Mikolajczyk, K.: Learning local feature descriptors with triplets and shallow convolutional neural networks. In: British Machine Vision Conference (BMVC), vol. 1, pp. 3 (2016)
Google Scholar
He, K., Lu, Y., Sclaroff, S.: Local descriptors optimized for average precision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 596–605 (2018)
Google Scholar
Kim, J., Jung, W., Kim, H., Lee, J.: CyCNN: a rotation invariant CNN using polar mapping and cylindrical convolution layers. arXiv preprint arXiv:2007.10588 (2020)
Gunatilaka, A.H., Baertlein, B.A.: Feature-level and decision-level fusion of noncoincidently sampled sensors for land mine detection. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 577–589 (2001)
Article Google Scholar
Chum, O., Werner, T., Matas, J.: Two-view geometry estimation unaffected by a dominant plane. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 1, pp. 772–779. IEEE (2005)
Google Scholar
Balntas, V., Lenc, K., Vedaldi, A., Mikolajczyk, K.: HPatches: a benchmark and evaluation of handcrafted and learned local descriptors. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar

Download references

Acknowledgment

This research was supported in part by the South China University of Technology Research Start-up Fund No. X2WD/K3200890, as well as partly by the Guangzhou Huangpu District International Research Collaboration Fund No. 2022GH13. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the sponsoring agencies.

Author information

Authors and Affiliations

South China University of Technology, Guangzhou, China
Xianfeng Song, Yi Zou, Zheng Shi & Yanfeng Yang
Gosuncn Technology Group CO., LTD, Guangzhou, China
Dacheng Li

Authors

Xianfeng Song
View author publications
You can also search for this author in PubMed Google Scholar
Yi Zou
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Shi
View author publications
You can also search for this author in PubMed Google Scholar
Yanfeng Yang
View author publications
You can also search for this author in PubMed Google Scholar
Dacheng Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Zou .

Editor information

Editors and Affiliations

Scholl of Automation, Central South University, Changsha, China
Biao Luo
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Long Cheng
Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, China
Zheng-Guang Wu
School of Automation, Guangdong University of Technology, Guangzhou, China
Hongyi Li
School of Electrical Engineering and Telecommunications, UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Song, X., Zou, Y., Shi, Z., Yang, Y., Li, D. (2024). DeFusion: Aerial Image Matching Based on Fusion of Handcrafted and Deep Features. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1968. Springer, Singapore. https://doi.org/10.1007/978-981-99-8181-6_25

Download citation

DOI: https://doi.org/10.1007/978-981-99-8181-6_25
Published: 27 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8180-9
Online ISBN: 978-981-99-8181-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

DeFusion: Aerial Image Matching Based on Fusion of Handcrafted and Deep Features