ABSTRACT
Accurate recognition of image targets is a fundamental intelligent perception task and extracting effective features is the prerequisite. However, there are still problems with both hand-crafted features and the ones learned by deep neural networks, such as insufficient generalization ability, limited applicability, and insufficient robustness. This paper first selects representative hand-crafted features and deep convolutional features, analyzing their strengths and weaknesses from the perspective of saliency. Then, inspired by the aforementioned analysis results, a feature saliency extraction process is modeled as the feature coding of the autoencoder based on extreme learning machine (ELM-AE), and further integrated into the object recognition framework composed of a deep convolutional feature extractor and an extreme learning machine classifier. In a consequence, a hybrid neural network model for object recognition based on feature saliency is prposed. Finally, experimental results on the German traffic sign recognition benchmark (GTSRB) show that the proposed model can achieve better performance.
- C. P. Papageorgiou, M. Oren and T. Poggio, "A general framework for object detection," Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271), Bombay, India, 1998, pp. 555-562, doi: 10.1109/ICCV.1998.710772.Google ScholarCross Ref
- D. Lowe, “Distinctive image features from scale-invariant keypoints,” INTERNATIONAL JOURNAL OF COMPUTER VISION, vol. 60, no. 2, pp.91–110, Nov. 2004, doi: 10.1023/B: VISI.0000029664.99615.94.Google ScholarDigital Library
- Y. Ke, R. Sukthankar, and IEEE Computer Society, “PCA-SIFT: A more distinctive representation for local image descriptors,” presented at the PROCEEDINGS OF THE 2004 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 2, 2004, pp. 506–513.Google Scholar
- H. Bay, T. Tuytelaars, and L. Van Gool, “SURF: Speeded up robust features,” in COMPUTER VISION - ECCV 2006 , PT 1, PROCEEDINGS, A. Leonardis, H. Bischof, and A. Pinz, Eds., 2006, pp. 404–417. doi: 10.1007/11744023_32.Google ScholarDigital Library
- N Dalal, B Triggs. Histograms of Oriented Gradients for Human Detection [J]. IEEE Computer Society Conference on Computer Vision & Pattern Recognition, 2005, 1(12): 886-893.Google ScholarDigital Library
- X. Ren and D. Ramanan, "Histograms of Sparse Codes for Object Detection," 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 2013, pp. 3246-3253, doi: 10.1109/CVPR.2013.417.Google ScholarDigital Library
- Mallat, S.G. & Zhang, Z. 1993.Matching pursuit in a time-frequency dictionary. IEEE Transactions on Signal Processing 41(12): 3397–3415.Google Scholar
- J. Yang, J. Wright, T. Huang, Y. Ma, and IEEE, “Image super-resolution as sparse representation of raw image patches,” presented at the 2008 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-12, 2008, pp. 2378-+.Google Scholar
- F. Perronnin, C. Dance, and IEEE, “Fisher kernels on visual vocabularies for image categorization,” presented at the 2007 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-8, 2007, pp. 2272-+.Google Scholar
- Lazebnik S, Schmid C, Ponce J. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories[C]//Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on. IEEE, 2006, 2: 2169-2178.Google Scholar
- S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” presented at the ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, Eds., 2015.Google Scholar
- K. He, X. Zhang, S. Ren, and J. Sun, “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,” IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, vol. 37, no. 9, pp. 1904–1916, Sep. 2015, doi: 10.1109/TPAMI.2015.2389824.Google ScholarDigital Library
- R. Girshick and IEEE, “Fast R-CNN,” presented at the 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, pp. 1440–1448. doi: 10.1109/ICCV.2015.169.Google ScholarDigital Library
- S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” presented at the ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, Eds., 2015.Google Scholar
- A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” COMMUNICATIONS OF THE ACM, vol. 60, no. 6, pp. 84–90, Jun. 2017, doi: 10.1145/3065386.Google ScholarDigital Library
- Dosovitskiy, Alexey “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.” ArXiv abs/2010.11929 (2020): n. pag.Google Scholar
- Washington García, Cristian Mera, Leonel Santana, and Luzmila Pro, "Algorithm for the Recognition of a Silhouette of a Person from an Image," Journal of Image and Graphics, Vol. 7, No. 2, pp. 59-63, June 2019. doi: 10.18178/joig.7.2.59-63Google ScholarCross Ref
- Ryo Hasegawa, Yutaro Iwamoto, and Yen-Wei Chen, "Robust Japanese Road Sign Detection and Recognition in Complex Scenes Using Convolutional Neural Networks," Journal of Image and Graphics, Vol. 8, No. 3, pp. 59-66, September 2020. doi: 10.18178/joig.8.3.59-66Google ScholarCross Ref
- Yordanka Karayaneva and Diana Hintea, "Object Recognition in Python and MNIST Dataset Modification and Recognition with Five Machine Learning Classifiers," Journal of Image and Graphics, Vol. 6, No. 1, pp. 10-20 June 2018. doi: 10.18178/joig.6.1.10-20Google ScholarCross Ref
- RUMELHART D E, HINTON G E, WILLIAMS R J. Learning representations by back-propagating errors [J]. Nature, 1986,323(6088): 533-536.Google Scholar
- NG A. Sparse autoencoder[J]. CS294A Lecture Notes, 2011,72(1): 1-19. VINCENT P , LAROCHELLE H , LAJOIE I ,et al. Stacked denoising autoencoders:learning useful representations in a deep network with a local denoising criterion[J]. Journal of Machine Learning.Google Scholar
- Research,2010,11(12): 33713408.RIFAI S, VINCENT P, MULLER X, Contractive auto-encoders:explicit invariance during feature extraction[C]// Proceedings of the 28th International Conference on Machine Learning. Bellevue:Omnipress, 2011: 833-840.Google Scholar
- RIFAI S, VINCENT P, MULLER X, Contractive auto-encoders:explicit invariance during feature extraction[C]// Proceedings of the 28th International Conference on Machine Learning. Bellevue:Omnipress, 2011: 833-840.Google Scholar
- K. He, “Masked Autoencoders Are Scalable Vision Learners,” presented at the 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, pp. 15979–15988. doi: 10.1109/CVPR52688.2022.01553.Google ScholarCross Ref
- G. Huang, Q. Zhu, and C. Siew, “Extreme learning machine: Theory and applications,” NEUROCOMPUTING, vol. 70, no. 1–3, pp. 489–501, Dec. 2006, doi: 10.1016/j.neucom.2005.12.126.Google ScholarCross Ref
- Tang J, Deng C, Huang G B. Extreme Learning Machine for Multilayer Perceptron[J]. IEEE Transactions on Neural Networks & Learning Systems, 2017:809-821.Google Scholar
- Bech A. A fast iterative shrinkage-thresholding algorithms for linear inverse problems[J]. SIAM J. Imaging Sciences, 2009, 2.Google Scholar
Index Terms
- A Feature Saliency Based Hybrid Neural Network Model for Object Recognition
Recommendations
Inception recurrent convolutional neural network for object recognition
AbstractDeep convolutional neural network (DCNN) is an influential tool for solving various problems in machine learning and computer vision. Recurrent connectivity is a very important component of visual information processing within the human brain. The ...
Feature saliency using signal-to-noise ratios in automated diagnostic systems developed for ECG beats
Artificial neural networks (ANNs) have been used in a great number of medical diagnostic decision support system applications and within feedforward ANNs framework there are a number of established measures such as saliency measures for identifying ...
A new method of emotional analysis based on CNN–BiLSTM hybrid neural network
AbstractThe hybrid neural network model proposed in this paper consists of two main parts: extracting local features of text vectors by convolutional neural network, extracting global features related to text context by BiLSTM, and fusing the features ...
Comments