Skip to main content
Log in

The image annotation algorithm using convolutional features from intermediate layer of deep learning

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The automatic image annotation is an effective computer operation that predicts the annotation of an unknown image by automatically learning potential relationships between the semantic concept space and the visual feature space in the annotation image dataset. Usually, the auto-labeling image includes the processing: learning processing and labeling processing. Existing image annotation methods that employ convolutional features of deep learning methods have a number of limitations, including complex training and high space/time expenses associated with the image annotation procedure. Accordingly, this paper proposes an innovative method in which the visual features of the image are presented by the intermediate layer features of deep learning, while semantic concepts are represented by mean vectors of positive samples. Firstly, the convolutional result is directly output in the form of low-level visual features through the mid-level of the pre-trained deep learning model, with the image being represented by sparse coding. Secondly, the positive mean vector method is used to construct visual feature vectors for each text vocabulary item, so that a visual feature vector database is created. Finally, the visual feature vector similarity between the testing image and all text vocabulary is calculated, and the vocabulary with the largest similarity used for annotation. Experiments on the datasets demonstrate the effectiveness of the proposed method; in terms of F1 score, the proposed method’s performance on the Corel5k dataset and IAPR TC-12 dataset is superior to that of MBRM, JEC-AF, JEC-DF, and 2PKNN with end-to-end deep features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Alex K, Ilya S, Hinton G E (2012) ImageNet classification with deep convolutional neural networks. In: proceedings of the 25th international conference on neural information processing systems, Lake Tahoe, Nevada, USA, 3-6 December 2012, pp 1106-1114

  2. Budikova P, Batko M, Zezula P (2018) ConceptRank for search-based image annotation. Multimed Tools Appl 77(7):8847–8882

    Article  Google Scholar 

  3. Chen YT, Wang J, Xia RL, Zhang Q, Cao ZH, Yang K (2019) The visual object tracking algorithm research based on adaptive combination kernel. J Ambient Intell Humaniz Comput 10(12):4855–4867

    Article  Google Scholar 

  4. Chen YT, Wang J, Liu SJ, Chen X, Xiong J, Xie JB, Yang K (2019) Multiscale fast correlation filtering tracking algorithm based on a feature fusion model. Concurr Comput. https://doi.org/10.1002/cpe.5533

  5. Chen YT, Zhang HP, Liu LW, Chen X, Zhang Q, Yang K, Xia RL, Xie JB (2020) Research on image inpainting algorithm of improved GAN based on two-discriminations networks. Appl Intell. https://doi.org/10.1007/s10489-020-01971-2

  6. Chen YT, Xu WH, Zuo JW, Yang K (2019) The fire recognition algorithm using dynamic feature fusion and IV-SVM classifier. Clust Comput 22(10):S7665–S7675

  7. Chen YT, Phonevilay V, Tao JJ, Chen X, Xia RL, Zhang Q, Yang K, Xiong J, Xie JB (2020) The face image super-resolution algorithm based on combined representation learning. Multimed Tools Appl. https://doi.org/10.1007/s11042-020-09969-1

  8. Chen YT, Liu LW, Tao JJ, Xia RL, Zhang Q, Yang K, Xiong J, Chen X (2020) The improved image Inpainting algorithm via encoder and similarity constraint. Vis Comput. https://doi.org/10.1007/s00371-020-01932-3

  9. Chen YT, Tao JJ, Liu LW, Xiong J, Xia RL, Xie JB, Zhang Q, Yang K (2020) Research of improving semantic image segmentation based on a feature fusion model. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-020-02066-z

  10. Chen Y T, Tao J J, Zhang Q, Yang K, Chen X, Xiong J, Xia R L, Xie J B (2020) Saliency Detection via improved hierarchical principle component analysis method. Wirel Commun Mob Comput, vol. 2020, Article ID 8822777

  11. Cheng QM, Zhang Q, Fu P, Tu CH, Li S (2018) A survey and analysis on automatic image annotation. Pattern Recogn 79(7):242–259

    Article  Google Scholar 

  12. Diwakar M, Kumar M (2018) A review on CT image noise and its denoising. Biomed Signal Process Control 42:73–88

    Article  Google Scholar 

  13. Diwakar M, Kumar M (2018) CT image denoising using NLM and correlation-based wavelet packet thresholding. IET Image Process 12(5):708–715

    Article  Google Scholar 

  14. Diwakar M, Singh P (2020) CT image denoising using multivariate model and its method noise thresholding in non-subsampled shearlet domain. Biomed Signal Process Control 57:101754. https://doi.org/10.1016/j.bspc.2019.101754

  15. Gong Y C, Jia Y Q, Leung T, Toshev A, Loffe S (2014) Deep convolutional ranking for multilabel image annotation. In: proceedings of international conference on learning representation, Banff, AB, Canada, 14-16 April 2014, https://arxiv.org/abs/1312.4894v2. Accessed 14 Apr 2014

  16. Guillaumin M, Mensink T, Verbeek J, Schmid C (2009) TagProp: discriminative metric learning in nearest neighbor models for image auto-annotation. In: proceedings of IEEE international conference on computer vision, Kyoto, Japan, 27 September-4 October, 2009, pp 309-316

  17. He K M, Zhang X Y, Ren S Q, Sun J (2016) Deep residual learning for image recognition. In: proceedings of IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, 27-30 June 2016, pp 770-778

  18. Jesus R, Abrantes AJ, Correia N (2011) Methods for automatic and assisted image annotation. Multimed Tools Appl 55(1):7–26

    Article  Google Scholar 

  19. Ji Q, Zhang LY, Shu XB, Tang JH (2019) Image annotation refinement via 2P-KNN based group sparse reconstruction. Multimed Tools Appl 78(10):13213–13225

    Article  Google Scholar 

  20. Johnson J, Ballan L, Li F F (2015) Love thy neighbors: image annotation by exploiting image metadata. In: proceedings of IEEE international conference on computer vision, Santiago, Chile, 7-13 December 2015, pp 4624-4632

  21. Kumar M, Diwakar (2019) A new exponentially directional weighted function based CT image denoising using total variation. Journal of King Saud University - Computer and Information Sciences, 31(1), pp. 113–124

  22. Kumar M, Diwakar M (2018) CT image denoising using locally adaptive shrinkage rule in tetrolet domain. Journal of King Saud University - Computer and Information Sciences 30(1):41–50

    Article  Google Scholar 

  23. Liao X, Li KD, Zhu XS, Liu KJR (2020) Robust detection of image operator chain with two-stream convolutional neural network. IEEE Journal of Selected Topics in Signal Processing 14(5):955–968

    Article  Google Scholar 

  24. Liao X, Yu YB, Li B, Li ZP, Qin Z (2020) A new payload partition strategy in color image steganography. IEEE Transactions on Circuits and Systems for Video Technology 30(3):685–696

    Article  Google Scholar 

  25. Liao X, Yin JJ, Chen ML, Qin Z (2020) Adaptive payload distribution in multiple images steganography based on image texture features. IEEE Transactions on Dependable and Secure Computing:1. https://doi.org/10.1109/TDSC.2020.3004708

  26. Lu WP, Zhang X, Lu HM, Li FF (2020) Deep hierarchical encoding model for sentence semantic matching. J Vis Commun Image Represent 71:102794. https://doi.org/10.1016/j.jvcir.2020.102794

    Article  Google Scholar 

  27. Luo YJ, Qin JH, Xiang XY, Tan Y, Liu Q, Xiang LY (2020) Coverless real-time image information hiding based on image block matching and dense convolutional network. J Real-Time Image Proc 17(1):125–135

    Article  Google Scholar 

  28. Makadia A, Pavlovic V, Kumar S (2008) A new baseline for image annotation. In proceedings of European conference on computer vision, Marseille, France, 12-18 October 2008: pp 316-329

  29. Murthy V N, Maji S, Manmatha R (2015) Automatic image annotation using deep learning representations. In: proceedings of ACM on international conference on multimedia retrieval, Shanghai, China, 23-26 June 2015, pp 603-606

  30. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: proceedings of the 3rd international conference on learning representation, San Diego, CA, USA, 7-9 may 2015, https://arxiv.org/abs/1409.1556. Accessed 10 Apr 2015

  31. Sun L, Ma CY, Chen YJ, Zheng YH, Shim HJ, Wu ZB, Jeon B (2019) Low rank component induced spatial-spectral kernel method for hyperspectral image classification. IEEE Transactions on Circuits and Systems for Video Technology:1. https://doi.org/10.1109/TCSVT.2019.2946723

  32. Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S E, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: proceedings of IEEE conference on computer vision and pattern recognition, Boston, MA, USA, 7-12 June 2015, pp 1-9

  33. Verma Y, Jawahar C V (2012) Image annotation using metric learning in semantic neighborhoods. In: Proceedings of European Conference on Computer Vision, Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, 2012, 7574, pp. 836–849

  34. Xu HJ, Huang CQ, Huang XD, Huang MX (2019) Multi-modal multi-concept-based deep neural network for automatic image annotation. Multimed Tools Appl 78(21):30651–30675

    Article  Google Scholar 

  35. Yu F, Liu L, Shen H, Zhang Z N, Huang Y Y, Cai S, Deng Z L, Wan Q Z (2020) Multistability analysis, coexisting multiple attractors and FPGA implementation of Yu-Wang four-wing chaotic system. Math. Probl. Eng, vol. 2020, Article ID 7530976

  36. Yu F, Liu L, Shen H, Zhang Z N, Huang Y Y, Shi C Q, Cai S, Wu X M, Du S C, Wan Q Z (2020) Dynamic analysis, Circuit design and Synchronization of a novel 6D memristive four-wing hyperchaotic system with multiple coexisting attractors. Complexity, vol. 2020, Article ID 5904607

  37. Zhang JM, Xie ZP, Sun J, Zou X, Wang J (2020) A cascaded R-CNN with multiscale attention and imbalanced samples for traffic sign detection. IEEE Access 8:29742–29754

    Article  Google Scholar 

Download references

Acknowledgments

We are grateful to all students and teachers who participated in this study and all the colleagues working to realize this project.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 61972056, 61772454, 61402053, 61981340416, the Natural Science Foundation of Hunan Province of China under Grant 2020JJ4623, the Scientific Research Fund of Hunan Provincial Education Department under Grant 17A007, 19C0028, 19B005, the Changsha Science and Technology Planning under Grant KQ1703018, KQ1706064, KQ1703018–01, KQ1703018–04, the Junior Faculty Development Program Project of Changsha University of Science and Technology under Grant 2019QJCZ011, the “Double First-class” International Cooperation and Development Scientific Research Project of Changsha University of Science and Technology under Grant 2019IC34, the Practical Innovation and Entrepreneurship Ability Improvement Plan for Professional Degree Postgraduate of Changsha University of Science and Technology under Grant SJCX202072, the Postgraduate Training Innovation Base Construction Project of Hunan Province under Grant 2019–248-51, 2020–172-48.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuantao Chen.

Ethics declarations

Conflict of interest

No potential conflict of interest was reported by the authors.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Y., Liu, L., Tao, J. et al. The image annotation algorithm using convolutional features from intermediate layer of deep learning. Multimed Tools Appl 80, 4237–4261 (2021). https://doi.org/10.1007/s11042-020-09887-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-09887-2

Keywords

Navigation