Skip to main content
Log in

CNN deep learning-based image to vector depiction

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In the computational science and engineering domains, the depiction of picture information remains an intricate problem. Such a description needs an accurate recognition of various objects and individuals together with their attributes, correlations, and panorama information. Based on this fact, we depict the image contents in the natural language or image description generation methods using the convolutional neural networks (CNNs)-assisted deep learning (CNN-DL) approach, wherein the images are transformed to vectors. The DL and study attributes via the machine-learned data were used to construct the complete pictures from the real world. Two sections were considered based on image classification for CNN’s improvement method to develop a classification model and the good results of the classification via a novel method for describing an image to the vector of each object in the image. The learning and relationship activity included all the essential categorizing and classifying entities. In addition, the developed system was extended to handle the open detection and hazards classification. The performance evaluation (using the CIFAR dataset) of the newly developed system revealed its better strength and flexibility in managing the test images from a new-fangled and isolated field than the reported techniques.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Adnan MM, Rahim MSM, Rehman A, Mehmood Z, Saba T, Naqvi RA (2021) Automatic image annotation based on deep learning models: a systematic review and future challenges. IEEE Access 9:50253–50264

    Article  Google Scholar 

  2. Anderson P, He X, Buehler C, Teney D, Johnson M, Gould S, Zhang L (2018) Bottom-up and top-down attention for image captioning and visual question answering. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6077-6086)

  3. Ayadi W, Elhamzi W, Charfi I, Atri M (2021) Deep CNN for brain tumor classification. Neural Process Lett 53(1):671–700

    Article  Google Scholar 

  4. Banerjee S, Lavie A (2005) METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization (pp. 65-72)

  5. Benyahia S, Meftah B, Lézoray O (2022) Multi-features extraction based on deep learning for skin lesion classification. Tissue Cell 74:101701

    Article  Google Scholar 

  6. Bianchini M, Scarselli F (2014) On the complexity of neural network classifiers: A comparison between shallow and deep architectures. IEEE TransacNeural Netw Learn Syst 25(8):1553–1565

    Article  Google Scholar 

  7. Bullins B, Hazan E, Kalai A, Livni R (2019) Generalize across tasks: efficient algorithms for linear representation learning. In algorithmic learning theory (pp. 235-246). PMLR

  8. Chen X., Lawrence Zitnick C (2015) Mind's eye: A recurrent visual representation for image caption generation. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2422-2431)

  9. Chen Y, Liu L, Tao J, Chen X, Xia R, Zhang Q, Xie J (2021) The image annotation algorithm using convolutional features from intermediate layer of deep learning. Multimed Tools Appl 80(3):4237–4261

    Article  Google Scholar 

  10. Chun PJ, Yamane T, Maemura Y (2022) A deep learning-based image captioning method to automatically generate comprehensive explanations of bridge damage. Comput-Aided Civil Infrastruc Eng 37(11):1387–1401

    Article  Google Scholar 

  11. Dahl GE, Yu D, Deng L, Acero A (2011) Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans Audio Speech Lang Process 20(1):30–42

    Article  Google Scholar 

  12. Deng L, Yu D (2014) Deep learning: methods and applications. Foundations Trends® Sig Proc 7(3–4):197–387

    Article  MathSciNet  MATH  Google Scholar 

  13. Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2625-2634)

  14. El-Komy A, Shahin OR, Abd El-Aziz RM, Taloba AI (2022) Integration of computer vision and natural language processing in multimedia robotics application. Inform Sci Lett 11(3):9

    Google Scholar 

  15. Esteva A, Chou K, Yeung S, Naik N, Madani A, Mottaghi A, Socher R (2021) Deep learning-enabled medical computer vision. NPJ Digital Med 4(1):1–9

    Article  Google Scholar 

  16. Gan Z, Gan C, He X, Pu Y, Tran K, Gao J, Deng L (2017) Semantic compositional networks for visual captioning. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5630-5639)

  17. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT press

    MATH  Google Scholar 

  18. He X, Deng L (2017) Deep learning for image-to-text generation: A technical overview. IEEE Signal Process Mag 34(6):109–116

    Article  Google Scholar 

  19. He X, Deng L (2018) Deep learning in natural language generation from images. In deep learning in natural language processing (pp. 289–307). Springer, Singapore

  20. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778)

  21. Hinton G, Deng L, Yu D, Dahl GE, Mohamed AR, Jaitly N, Kingsbury B (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Proc Mag 29(6):82–97

    Article  Google Scholar 

  22. Hodosh M, Young P, Hockenmaier J (2013) Framing image description as a ranking task: data, models and evaluation metrics. J Artif Intell Res 47:853–899

    Article  MathSciNet  MATH  Google Scholar 

  23. Idicula SM (2019) Dense model for automatic image description generation with game theoretic optimization. Information 10(11):354

    Article  Google Scholar 

  24. Jena B, Saxena S, Nayak GK, Saba L, Sharma N, Suri JS (2021) Artificial intelligence-based hybrid deep learning models for image classification: the first narrative review. Comput Biol Med 137:104803

    Article  Google Scholar 

  25. Jia X, Gavves E, Fernando B, Tuytelaars T (2015) Guiding the long-short term memory model for image caption generation. In proceedings of the IEEE international conference on computer vision (pp. 2407-2415)

  26. Kadhim KA, Adnan MM, Waheed SR, Alkhayyat A (2021) Automated high-security license plate recognition system. Materials Today: Proceedings, WITHDRAWN: Automated high-security license plate recognition system

  27. Kiros R, Salakhutdinov R, Zemel R (2014) Multimodal neural language models. In international conference on machine learning (pp. 595-603). PMLR

  28. Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90

    Article  Google Scholar 

  29. Kulkarni G, Premraj V, Ordonez V, Dhar S, Li S, Choi Y, Berg TL (2013) Babytalk: understanding and generating simple image descriptions. IEEE Trans Pattern Anal Mach Intell 35(12):2891–2903

    Article  Google Scholar 

  30. Li S, Kulkarni G, Berg T, Berg A, Choi Y (2011) Composing simple image descriptions using web-scale n-grams. In proceedings of the fifteenth conference on computational natural language learning (pp. 220-228)

  31. Li S, Xiao T, Li H, Zhou B, Yue D, Wang X (2017) Person search with natural language description. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1970-1979)

  32. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Zitnick CL (2014) Microsoft coco: common objects in context. In European conference on computer vision (pp. 740–755). Springer, Cham

  33. Lin K, Li D, He X, Zhang Z, Sun MT (2017) Adversarial ranking for language generation. Adv Neural Inf Proces Syst 30

  34. Liu Y, An X (2017) A classification model for the prostate cancer based on deep learning. In 2017 10th international congress on image and signal processing, BioMedical engineering and informatics (CISP-BMEI) (pp. 1-6). IEEE

  35. Liu C, Mao J, Sha F, Yuille A (2017) Attention correctness in neural image captioning. In Thirty-first AAAI conference on artificial intelligence, Attention Correctness in Neural Image Captioning

  36. Mao J, Xu W, Yang Y, Wang J, Huang Z, Yuille A (2014) Deep captioning with multimodal recurrent neural networks (m-rnn). arXiv preprint arXiv:1412.6632

  37. McCulloch WS, Pitts W (1943) A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys 5(4):115–133

    Article  MathSciNet  MATH  Google Scholar 

  38. Najjar FH, Al-Jawahry HM, Al-Khaffaf MS, Al-Hasani AT (2021) A novel hybrid feature extraction method using LTP, TFCM, and GLCM. In journal of physics: conference series (Vol. 1892, no. 1, p. 012018). IOP publishing

  39. O'Connor P, Neil D, Liu SC, Delbruck T, Pfeiffer M (2013) Real-time classification and sensor fusion with a spiking deep belief network. Front Neurosci 7:178

    Article  Google Scholar 

  40. Ordonez V, Kulkarni G, Berg T (2011) Im2text: describing images using 1 million captioned photographs. Advances in neural information processing systems, 24

  41. Piasco N, Sidibé D, Gouet-Brunet V, Demonceaux C (2021) Improving image description with auxiliary modality for visual localization in challenging conditions. Int J Comput Vis 129(1):185–202

    Article  Google Scholar 

  42. Qin J, Pan W, Xiang X, Tan Y, Hou G (2020) A biological image classification method based on improved CNN. Ecolog Inform 58:101093

    Article  Google Scholar 

  43. Rashtchian C, Young P, Hodosh M, Hockenmaier J (2010) Collecting image annotations using amazon’s mechanical turk. In proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon’s mechanical Turk (pp. 139-147)

  44. Shao H, Lin J, Zhang L, Galar D, Kumar U (2021) A novel approach of multisensory fusion to collaborative fault diagnosis in maintenance. Inform Fusion 74:65–76

    Article  Google Scholar 

  45. Sharma H, Jalal AS (2022) Image captioning improved visual question answering. Multimed Tools Appl 81(24):34775–34796

    Article  Google Scholar 

  46. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  47. Sun Y, Xue B, Zhang M, Yen GG, Lv J (2020) Automatically designing CNN architectures using the genetic algorithm for image classification. IEEE Transac Cybernet 50(9):3840–3854

    Article  Google Scholar 

  48. Venugopalan S, Rohrbach M, Donahue J, Mooney R, Darrell T, Saenko K (2015) Sequence to sequence-video to text. In proceedings of the IEEE international conference on computer vision (pp. 4534-4542)

  49. Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: A neural image caption generator. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3156-3164)

  50. Waheed SR, Alkawaz MH, Rehman A, Almazyad AS, Saba T (2016) Multifocus watermarking approach based on discrete cosine transform. Microsc Res Tech 79(5):431–437

    Article  Google Scholar 

  51. Waheed SR, Suaib NM, Rahim MSM, Adnan MM, Salim AA (2021) Deep learning algorithms-based object detection and localization revisited. In journal of physics: conference series (Vol. 1892, no. 1, p. 012001). IOP publishing

  52. Wang H, Meghawat A, Morency LP, Xing EP (2016) Select-additive learning: improving cross-individual generalization in multimodal sentiment analysis. arXiv preprint arXiv:1609.05244

  53. Wu FX, Li M (2019) Deep learning for biological/clinical data. Neurocomputing 324:1–2

    Article  Google Scholar 

  54. Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In international conference on machine learning (pp. 2048-2057). PMLR

  55. Xu S, Wang J, Shou W, Ngo T, Sadick AM, Wang X (2021) Computer vision techniques in construction: a critical review. Arch Computa Meth Eng 28(5):3383–3397

    Article  Google Scholar 

  56. Yang Z, Yuan Y, Wu Y, Cohen WW, Salakhutdinov RR (2016) Review networks for caption generation. Adv Neural Inf Proces Syst 29

  57. Yao K, Peng B, Zhang Y, Yu D, Zweig G, Shi Y (2014) Spoken language understanding using long short-term memory neural networks. In 2014 IEEE spoken language technology workshop (SLT) (pp. 189-194). IEEE

  58. Young P, Lai A, Hodosh M, Hockenmaier J (2014) From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions. Transac Assoc Comput Linguist 2:67–78

    Article  Google Scholar 

  59. Yu L, Zhang W, Wang J, Yu Y (2017) Seqgan: sequence generative adversarial nets with policy gradient. In Proceedings of the AAAI conference on artificial intelligence, 31 (1)

Download references

Acknowledgments

Authors are extremely thankful to Universiti Teknologi Malaysia (UTM), Ministry of Higher Education Malaysia (MOHE), and RMC for research grant FRGS Q.J130000.2509.21H11, and UTM RA ICONIC GRANT Q.J130000.4354.09G60, FRGS-04E86 and UTMFR 21H78. Authors are also grateful to Research Management Centre-Universiti Teknologi Malaysia (RMC-UTM) for supporting under Postdoctoral fellowship scheme.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A.A. Salim.

Ethics declarations

Conflict of interest

Please check the following as appropriate:

All authors have participated in (a) conception and design, or analysis and interpretation of the data; (b) drafting the article or revising it critically for important intellectual content; and (c) approval of the final version.

This manuscript has not been submitted to, nor is under review at, another journal or other publishing venue.

The authors have no affiliation with any organization with a direct or indirect financial interest in the subject matter discussed in the manuscript.

The following authors have affiliations with organizations with direct or indirect financial interest in the subject matter discussed in the manuscript:

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Waheed, S.R., Rahim, M.S.M., Suaib, N.M. et al. CNN deep learning-based image to vector depiction. Multimed Tools Appl 82, 20283–20302 (2023). https://doi.org/10.1007/s11042-023-14434-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-14434-w

Keywords

Navigation