ABSTRACT
Keypoint feature extraction algorithms based on manual feature rely on the professional domain knowledge of the designer, and usually perform well in depicting the detail of an image. However, these manual feature-based feature extraction algorithms tend to fail in obtaining the abstract information of the image, which results in low robustness of the algorithms. In view of this, this paper proposes an image feature extraction algorithm that adaptively integrates the low layer feature of CNN and the high layer feature of CNN. The proposed algorithm makes use of fact that the low layer feature can detect the detail of an image and the high layer feature can detect the abstract information of the image. Meanwhile, the similarity between the area descriptors of two feature points is employed to adaptively determine the weight of the high layer feature and that of the low layer feature in the integrated feature, so as to realize the trade-off between distinguishability and invariance of features, and make the integrated feature more robust. Experiments on HPatches, Oxford, and RDNIM datasets show that the proposed algorithm is not only robust to illumination changes, but also shows superior performance when it comes to challenging scenes such as large viewing angle changes and day and night matching.
- Biswas B, Kr Ghosh S, Hore M, Sift-based visual tracking using optical flow and belief propagation algorithm[J]. The Computer Journal, 2020.Google Scholar
- Liang MAO, Yueju XUE, Yinghui WEI, Tingting ZHU. An Eyeglasses Removal Method for Fine-grained Face Recognition[J]. Journal of Electronics & Information Technology, 2021, 43(5): 1448-1456.Google Scholar
- PENG Chang, LI Guang-ze, ZHANG Xiao-yang, ZUO Yang.Research on Infrared Remote Sensing Image Mosaicing Based on Improved ORB Algorithm[J]. Control Engineering of China, 2020, 27(8): 1332-1336.Google Scholar
- Mur-Artal R, Montiel J M M, Tardos J D. ORB-SLAM: a versatile and accurate monocular SLAM system[J]. IEEE transactions on robotics, 2015, 31(5): 1147-1163.Google Scholar
- Campos C, Elvira R, Rodríguez J J G, ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap SLAM[J]. IEEE Transactions on Robotics, 2021.Google ScholarCross Ref
- Lowe D G. Distinctive image features from scale-invariant keypoints[J]. International journal of computer vision, 2004, 60(2): 91-110.Google ScholarDigital Library
- Rublee E, Rabaud V, Konolige K, ORB: An efficient alternative to SIFT or SURF[C]//2011 International conference on computer vision. Ieee, 2011: 2564-2571.Google Scholar
- Yi K M, Trulls E, Lepetit V, Lift: Learned invariant feature transform[C]//European conference on computer vision. Springer, Cham, 2016: 467-483.Google Scholar
- DeTone D, Malisiewicz T, Rabinovich A. Superpoint: Self-supervised interest point detection and description[C]//Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2018: 224-236.Google Scholar
- Dusmanu M, Rocco I, Pajdla T, D2-net: A trainable cnn for joint detection and description of local features[J]. arXiv preprint arXiv:1905.03561, 2019.Google Scholar
- Revaud J, Weinzaepfel P, De Souza C, R2D2: repeatable and reliable detector and descriptor[J]. arXiv preprint arXiv:1906.06195, 2019.Google Scholar
- Xu K, Huang H, Li Y, Multilayer feature fusion network for scene classification in remote sensing[J]. IEEE Geoscience and Remote Sensing Letters, 2020, 17(11): 1894-1898.Google ScholarCross Ref
- Huilan LUO, Hongkun CHEN. Multi-scale Semantic Information Fusion for Object Detection[J]. Journal of Electronics & Information Technology, 2021, 43: 1-9.Google Scholar
- Pautrat R, Larsson V, Oswald M R, Online Invariance Selection for Local Feature Descriptors[C]//European Conference on Computer Vision. Springer, Cham, 2020: 707-724.Google Scholar
- Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.Google Scholar
- Arandjelovic R, Gronat P, Torii A, NetVLAD: CNN architecture for weakly supervised place recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 5297-5307.Google Scholar
- Lin T Y, Maire M, Belongie S, Microsoft coco: Common objects in context[C]//European conference on computer vision. Springer, Cham, 2014: 740-755.Google Scholar
- Murmann L, Gharbi M, Aittala M, A dataset of multi-illumination images in the wild[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 4080-4089.Google Scholar
- Helou M E, Zhou R, Barthas J, Vidit: Virtual image dataset for illumination transfer[J]. arXiv preprint arXiv:2005.05460, 2020.Google Scholar
- Balntas V, Riba E, Ponsa D, Learning local feature descriptors with triplets and shallow convolutional neural networks[C]//Bmvc. 2016, 1(2): 3.Google Scholar
- Balntas V, Lenc K, Vedaldi A, HPatches: A benchmark and evaluation of handcrafted and learned local descriptors[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 5173-5182.Google Scholar
- Mikolajczyk K, Tuytelaars T, Schmid C, A comparison of affine region detectors[J]. International journal of computer vision, 2005, 65(1): 43-72.Google ScholarDigital Library
- Zhou H, Sattler T, Jacobs D W. Evaluating local features for day-night matching[C]//European Conference on Computer Vision. Springer, Cham, 2016: 724-736.Google Scholar
Index Terms
- An Adaptive Fusion Feature Extraction Algorithm Based on CNN
Recommendations
Ultrasonic liver tissue characterization by feature fusion
Highlights We extract five feature spaces for the ultrasonic liver tissue characterization. The proposed genetic-algorithm-based feature fusion is effective. Feature fusion combines features selected from different feature spaces. We select features to ...
Speech Emotion Recognition Based on BLSTM and CNN Feature Fusion
ICDSP '20: Proceedings of the 2020 4th International Conference on Digital Signal ProcessingSpeech emotion recognition (SER) is always challenging because of factors such as emotional corpus, acoustic features and SER modeling. SER based on deep learning are limited to using a spectrogram or handcrafted features as input, but cannot capture ...
Feature extraction and fusion network for salient object detection
AbstractIn the salient object detection (SOD) models based on convolutional neural network (CNN), the high-level semantic features and low-level features of the image are effectively fused and complementary, which can effectively improve the performance ...
Comments