skip to main content
10.1145/3635638.3635648acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmlmiConference Proceedingsconference-collections
research-article

A Feature Saliency Based Hybrid Neural Network Model for Object Recognition

Published:16 January 2024Publication History

ABSTRACT

Accurate recognition of image targets is a fundamental intelligent perception task and extracting effective features is the prerequisite. However, there are still problems with both hand-crafted features and the ones learned by deep neural networks, such as insufficient generalization ability, limited applicability, and insufficient robustness. This paper first selects representative hand-crafted features and deep convolutional features, analyzing their strengths and weaknesses from the perspective of saliency. Then, inspired by the aforementioned analysis results, a feature saliency extraction process is modeled as the feature coding of the autoencoder based on extreme learning machine (ELM-AE), and further integrated into the object recognition framework composed of a deep convolutional feature extractor and an extreme learning machine classifier. In a consequence, a hybrid neural network model for object recognition based on feature saliency is prposed. Finally, experimental results on the German traffic sign recognition benchmark (GTSRB) show that the proposed model can achieve better performance.

References

  1. C. P. Papageorgiou, M. Oren and T. Poggio, "A general framework for object detection," Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271), Bombay, India, 1998, pp. 555-562, doi: 10.1109/ICCV.1998.710772.Google ScholarGoogle ScholarCross RefCross Ref
  2. D. Lowe, “Distinctive image features from scale-invariant keypoints,” INTERNATIONAL JOURNAL OF COMPUTER VISION, vol. 60, no. 2, pp.91–110, Nov. 2004, doi: 10.1023/B: VISI.0000029664.99615.94.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Y. Ke, R. Sukthankar, and IEEE Computer Society, “PCA-SIFT: A more distinctive representation for local image descriptors,” presented at the PROCEEDINGS OF THE 2004 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 2, 2004, pp. 506–513.Google ScholarGoogle Scholar
  4. H. Bay, T. Tuytelaars, and L. Van Gool, “SURF: Speeded up robust features,” in COMPUTER VISION - ECCV 2006 , PT 1, PROCEEDINGS, A. Leonardis, H. Bischof, and A. Pinz, Eds., 2006, pp. 404–417. doi: 10.1007/11744023_32.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. N Dalal, B Triggs. Histograms of Oriented Gradients for Human Detection [J]. IEEE Computer Society Conference on Computer Vision & Pattern Recognition, 2005, 1(12): 886-893.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. X. Ren and D. Ramanan, "Histograms of Sparse Codes for Object Detection," 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 2013, pp. 3246-3253, doi: 10.1109/CVPR.2013.417.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Mallat, S.G. & Zhang, Z. 1993.Matching pursuit in a time-frequency dictionary. IEEE Transactions on Signal Processing 41(12): 3397–3415.Google ScholarGoogle Scholar
  8. J. Yang, J. Wright, T. Huang, Y. Ma, and IEEE, “Image super-resolution as sparse representation of raw image patches,” presented at the 2008 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-12, 2008, pp. 2378-+.Google ScholarGoogle Scholar
  9. F. Perronnin, C. Dance, and IEEE, “Fisher kernels on visual vocabularies for image categorization,” presented at the 2007 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-8, 2007, pp. 2272-+.Google ScholarGoogle Scholar
  10. Lazebnik S, Schmid C, Ponce J. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories[C]//Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on. IEEE, 2006, 2: 2169-2178.Google ScholarGoogle Scholar
  11. S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” presented at the ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, Eds., 2015.Google ScholarGoogle Scholar
  12. K. He, X. Zhang, S. Ren, and J. Sun, “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,” IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, vol. 37, no. 9, pp. 1904–1916, Sep. 2015, doi: 10.1109/TPAMI.2015.2389824.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Girshick and IEEE, “Fast R-CNN,” presented at the 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, pp. 1440–1448. doi: 10.1109/ICCV.2015.169.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” presented at the ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, Eds., 2015.Google ScholarGoogle Scholar
  15. A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” COMMUNICATIONS OF THE ACM, vol. 60, no. 6, pp. 84–90, Jun. 2017, doi: 10.1145/3065386.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Dosovitskiy, Alexey “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.” ArXiv abs/2010.11929 (2020): n. pag.Google ScholarGoogle Scholar
  17. Washington García, Cristian Mera, Leonel Santana, and Luzmila Pro, "Algorithm for the Recognition of a Silhouette of a Person from an Image," Journal of Image and Graphics, Vol. 7, No. 2, pp. 59-63, June 2019. doi: 10.18178/joig.7.2.59-63Google ScholarGoogle ScholarCross RefCross Ref
  18. Ryo Hasegawa, Yutaro Iwamoto, and Yen-Wei Chen, "Robust Japanese Road Sign Detection and Recognition in Complex Scenes Using Convolutional Neural Networks," Journal of Image and Graphics, Vol. 8, No. 3, pp. 59-66, September 2020. doi: 10.18178/joig.8.3.59-66Google ScholarGoogle ScholarCross RefCross Ref
  19. Yordanka Karayaneva and Diana Hintea, "Object Recognition in Python and MNIST Dataset Modification and Recognition with Five Machine Learning Classifiers," Journal of Image and Graphics, Vol. 6, No. 1, pp. 10-20 June 2018. doi: 10.18178/joig.6.1.10-20Google ScholarGoogle ScholarCross RefCross Ref
  20. RUMELHART D E, HINTON G E, WILLIAMS R J. Learning representations by back-propagating errors [J]. Nature, 1986,323(6088): 533-536.Google ScholarGoogle Scholar
  21. NG A. Sparse autoencoder[J]. CS294A Lecture Notes, 2011,72(1): 1-19. VINCENT P , LAROCHELLE H , LAJOIE I ,et al. Stacked denoising autoencoders:learning useful representations in a deep network with a local denoising criterion[J]. Journal of Machine Learning.Google ScholarGoogle Scholar
  22. Research,2010,11(12): 33713408.RIFAI S, VINCENT P, MULLER X, Contractive auto-encoders:explicit invariance during feature extraction[C]// Proceedings of the 28th International Conference on Machine Learning. Bellevue:Omnipress, 2011: 833-840.Google ScholarGoogle Scholar
  23. RIFAI S, VINCENT P, MULLER X, Contractive auto-encoders:explicit invariance during feature extraction[C]// Proceedings of the 28th International Conference on Machine Learning. Bellevue:Omnipress, 2011: 833-840.Google ScholarGoogle Scholar
  24. K. He, “Masked Autoencoders Are Scalable Vision Learners,” presented at the 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, pp. 15979–15988. doi: 10.1109/CVPR52688.2022.01553.Google ScholarGoogle ScholarCross RefCross Ref
  25. G. Huang, Q. Zhu, and C. Siew, “Extreme learning machine: Theory and applications,” NEUROCOMPUTING, vol. 70, no. 1–3, pp. 489–501, Dec. 2006, doi: 10.1016/j.neucom.2005.12.126.Google ScholarGoogle ScholarCross RefCross Ref
  26. Tang J, Deng C, Huang G B. Extreme Learning Machine for Multilayer Perceptron[J]. IEEE Transactions on Neural Networks & Learning Systems, 2017:809-821.Google ScholarGoogle Scholar
  27. Bech A. A fast iterative shrinkage-thresholding algorithms for linear inverse problems[J]. SIAM J. Imaging Sciences, 2009, 2.Google ScholarGoogle Scholar

Index Terms

  1. A Feature Saliency Based Hybrid Neural Network Model for Object Recognition

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      MLMI '23: Proceedings of the 6th International Conference on Machine Learning and Machine Intelligence
      October 2023
      196 pages
      ISBN:9798400709456
      DOI:10.1145/3635638

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 January 2024

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)9
      • Downloads (Last 6 weeks)4

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format