CNN deep learning-based image to vector depiction

Waheed, Safa Riyadh; Rahim, Mohd Shafry Mohd; Suaib, Norhaida Mohd; Salim, A.A.

doi:10.1007/s11042-023-14434-w

CNN deep learning-based image to vector depiction

Published: 31 January 2023

Volume 82, pages 20283–20302, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Safa Riyadh Waheed^1,2,
Mohd Shafry Mohd Rahim¹,
Norhaida Mohd Suaib¹ &
…
A.A. Salim ORCID: orcid.org/0000-0002-2801-9673³

994 Accesses
13 Citations
Explore all metrics

Abstract

In the computational science and engineering domains, the depiction of picture information remains an intricate problem. Such a description needs an accurate recognition of various objects and individuals together with their attributes, correlations, and panorama information. Based on this fact, we depict the image contents in the natural language or image description generation methods using the convolutional neural networks (CNNs)-assisted deep learning (CNN-DL) approach, wherein the images are transformed to vectors. The DL and study attributes via the machine-learned data were used to construct the complete pictures from the real world. Two sections were considered based on image classification for CNN’s improvement method to develop a classification model and the good results of the classification via a novel method for describing an image to the vector of each object in the image. The learning and relationship activity included all the essential categorizing and classifying entities. In addition, the developed system was extended to handle the open detection and hazards classification. The performance evaluation (using the CIFAR dataset) of the newly developed system revealed its better strength and flexibility in managing the test images from a new-fangled and isolated field than the reported techniques.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

BoVW-CAM: Visual Explanation from Bag of Visual Words

Development and Classification of Image Dataset for Text-to-Image Generation

Article 29 February 2024

Scene representation using a new two-branch neural network model

Article 01 December 2023

References

Adnan MM, Rahim MSM, Rehman A, Mehmood Z, Saba T, Naqvi RA (2021) Automatic image annotation based on deep learning models: a systematic review and future challenges. IEEE Access 9:50253–50264
Article Google Scholar
Anderson P, He X, Buehler C, Teney D, Johnson M, Gould S, Zhang L (2018) Bottom-up and top-down attention for image captioning and visual question answering. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6077-6086)
Ayadi W, Elhamzi W, Charfi I, Atri M (2021) Deep CNN for brain tumor classification. Neural Process Lett 53(1):671–700
Article Google Scholar
Banerjee S, Lavie A (2005) METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization (pp. 65-72)
Benyahia S, Meftah B, Lézoray O (2022) Multi-features extraction based on deep learning for skin lesion classification. Tissue Cell 74:101701
Article Google Scholar
Bianchini M, Scarselli F (2014) On the complexity of neural network classifiers: A comparison between shallow and deep architectures. IEEE TransacNeural Netw Learn Syst 25(8):1553–1565
Article Google Scholar
Bullins B, Hazan E, Kalai A, Livni R (2019) Generalize across tasks: efficient algorithms for linear representation learning. In algorithmic learning theory (pp. 235-246). PMLR
Chen X., Lawrence Zitnick C (2015) Mind's eye: A recurrent visual representation for image caption generation. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2422-2431)
Chen Y, Liu L, Tao J, Chen X, Xia R, Zhang Q, Xie J (2021) The image annotation algorithm using convolutional features from intermediate layer of deep learning. Multimed Tools Appl 80(3):4237–4261
Article Google Scholar
Chun PJ, Yamane T, Maemura Y (2022) A deep learning-based image captioning method to automatically generate comprehensive explanations of bridge damage. Comput-Aided Civil Infrastruc Eng 37(11):1387–1401
Article Google Scholar
Dahl GE, Yu D, Deng L, Acero A (2011) Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans Audio Speech Lang Process 20(1):30–42
Article Google Scholar
Deng L, Yu D (2014) Deep learning: methods and applications. Foundations Trends® Sig Proc 7(3–4):197–387
Article MathSciNet MATH Google Scholar
Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2625-2634)
El-Komy A, Shahin OR, Abd El-Aziz RM, Taloba AI (2022) Integration of computer vision and natural language processing in multimedia robotics application. Inform Sci Lett 11(3):9
Google Scholar
Esteva A, Chou K, Yeung S, Naik N, Madani A, Mottaghi A, Socher R (2021) Deep learning-enabled medical computer vision. NPJ Digital Med 4(1):1–9
Article Google Scholar
Gan Z, Gan C, He X, Pu Y, Tran K, Gao J, Deng L (2017) Semantic compositional networks for visual captioning. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5630-5639)
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT press
MATH Google Scholar
He X, Deng L (2017) Deep learning for image-to-text generation: A technical overview. IEEE Signal Process Mag 34(6):109–116
Article Google Scholar
He X, Deng L (2018) Deep learning in natural language generation from images. In deep learning in natural language processing (pp. 289–307). Springer, Singapore
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778)
Hinton G, Deng L, Yu D, Dahl GE, Mohamed AR, Jaitly N, Kingsbury B (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Proc Mag 29(6):82–97
Article Google Scholar
Hodosh M, Young P, Hockenmaier J (2013) Framing image description as a ranking task: data, models and evaluation metrics. J Artif Intell Res 47:853–899
Article MathSciNet MATH Google Scholar
Idicula SM (2019) Dense model for automatic image description generation with game theoretic optimization. Information 10(11):354
Article Google Scholar
Jena B, Saxena S, Nayak GK, Saba L, Sharma N, Suri JS (2021) Artificial intelligence-based hybrid deep learning models for image classification: the first narrative review. Comput Biol Med 137:104803
Article Google Scholar
Jia X, Gavves E, Fernando B, Tuytelaars T (2015) Guiding the long-short term memory model for image caption generation. In proceedings of the IEEE international conference on computer vision (pp. 2407-2415)
Kadhim KA, Adnan MM, Waheed SR, Alkhayyat A (2021) Automated high-security license plate recognition system. Materials Today: Proceedings, WITHDRAWN: Automated high-security license plate recognition system
Kiros R, Salakhutdinov R, Zemel R (2014) Multimodal neural language models. In international conference on machine learning (pp. 595-603). PMLR
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Article Google Scholar
Kulkarni G, Premraj V, Ordonez V, Dhar S, Li S, Choi Y, Berg TL (2013) Babytalk: understanding and generating simple image descriptions. IEEE Trans Pattern Anal Mach Intell 35(12):2891–2903
Article Google Scholar
Li S, Kulkarni G, Berg T, Berg A, Choi Y (2011) Composing simple image descriptions using web-scale n-grams. In proceedings of the fifteenth conference on computational natural language learning (pp. 220-228)
Li S, Xiao T, Li H, Zhou B, Yue D, Wang X (2017) Person search with natural language description. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1970-1979)
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Zitnick CL (2014) Microsoft coco: common objects in context. In European conference on computer vision (pp. 740–755). Springer, Cham
Lin K, Li D, He X, Zhang Z, Sun MT (2017) Adversarial ranking for language generation. Adv Neural Inf Proces Syst 30
Liu Y, An X (2017) A classification model for the prostate cancer based on deep learning. In 2017 10th international congress on image and signal processing, BioMedical engineering and informatics (CISP-BMEI) (pp. 1-6). IEEE
Liu C, Mao J, Sha F, Yuille A (2017) Attention correctness in neural image captioning. In Thirty-first AAAI conference on artificial intelligence, Attention Correctness in Neural Image Captioning
Mao J, Xu W, Yang Y, Wang J, Huang Z, Yuille A (2014) Deep captioning with multimodal recurrent neural networks (m-rnn). arXiv preprint arXiv:1412.6632
McCulloch WS, Pitts W (1943) A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys 5(4):115–133
Article MathSciNet MATH Google Scholar
Najjar FH, Al-Jawahry HM, Al-Khaffaf MS, Al-Hasani AT (2021) A novel hybrid feature extraction method using LTP, TFCM, and GLCM. In journal of physics: conference series (Vol. 1892, no. 1, p. 012018). IOP publishing
O'Connor P, Neil D, Liu SC, Delbruck T, Pfeiffer M (2013) Real-time classification and sensor fusion with a spiking deep belief network. Front Neurosci 7:178
Article Google Scholar
Ordonez V, Kulkarni G, Berg T (2011) Im2text: describing images using 1 million captioned photographs. Advances in neural information processing systems, 24
Piasco N, Sidibé D, Gouet-Brunet V, Demonceaux C (2021) Improving image description with auxiliary modality for visual localization in challenging conditions. Int J Comput Vis 129(1):185–202
Article Google Scholar
Qin J, Pan W, Xiang X, Tan Y, Hou G (2020) A biological image classification method based on improved CNN. Ecolog Inform 58:101093
Article Google Scholar
Rashtchian C, Young P, Hodosh M, Hockenmaier J (2010) Collecting image annotations using amazon’s mechanical turk. In proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon’s mechanical Turk (pp. 139-147)
Shao H, Lin J, Zhang L, Galar D, Kumar U (2021) A novel approach of multisensory fusion to collaborative fault diagnosis in maintenance. Inform Fusion 74:65–76
Article Google Scholar
Sharma H, Jalal AS (2022) Image captioning improved visual question answering. Multimed Tools Appl 81(24):34775–34796
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Sun Y, Xue B, Zhang M, Yen GG, Lv J (2020) Automatically designing CNN architectures using the genetic algorithm for image classification. IEEE Transac Cybernet 50(9):3840–3854
Article Google Scholar
Venugopalan S, Rohrbach M, Donahue J, Mooney R, Darrell T, Saenko K (2015) Sequence to sequence-video to text. In proceedings of the IEEE international conference on computer vision (pp. 4534-4542)
Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: A neural image caption generator. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3156-3164)
Waheed SR, Alkawaz MH, Rehman A, Almazyad AS, Saba T (2016) Multifocus watermarking approach based on discrete cosine transform. Microsc Res Tech 79(5):431–437
Article Google Scholar
Waheed SR, Suaib NM, Rahim MSM, Adnan MM, Salim AA (2021) Deep learning algorithms-based object detection and localization revisited. In journal of physics: conference series (Vol. 1892, no. 1, p. 012001). IOP publishing
Wang H, Meghawat A, Morency LP, Xing EP (2016) Select-additive learning: improving cross-individual generalization in multimodal sentiment analysis. arXiv preprint arXiv:1609.05244
Wu FX, Li M (2019) Deep learning for biological/clinical data. Neurocomputing 324:1–2
Article Google Scholar
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In international conference on machine learning (pp. 2048-2057). PMLR
Xu S, Wang J, Shou W, Ngo T, Sadick AM, Wang X (2021) Computer vision techniques in construction: a critical review. Arch Computa Meth Eng 28(5):3383–3397
Article Google Scholar
Yang Z, Yuan Y, Wu Y, Cohen WW, Salakhutdinov RR (2016) Review networks for caption generation. Adv Neural Inf Proces Syst 29
Yao K, Peng B, Zhang Y, Yu D, Zweig G, Shi Y (2014) Spoken language understanding using long short-term memory neural networks. In 2014 IEEE spoken language technology workshop (SLT) (pp. 189-194). IEEE
Young P, Lai A, Hodosh M, Hockenmaier J (2014) From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions. Transac Assoc Comput Linguist 2:67–78
Article Google Scholar
Yu L, Zhang W, Wang J, Yu Y (2017) Seqgan: sequence generative adversarial nets with policy gradient. In Proceedings of the AAAI conference on artificial intelligence, 31 (1)

Download references

Acknowledgments

Authors are extremely thankful to Universiti Teknologi Malaysia (UTM), Ministry of Higher Education Malaysia (MOHE), and RMC for research grant FRGS Q.J130000.2509.21H11, and UTM RA ICONIC GRANT Q.J130000.4354.09G60, FRGS-04E86 and UTMFR 21H78. Authors are also grateful to Research Management Centre-Universiti Teknologi Malaysia (RMC-UTM) for supporting under Postdoctoral fellowship scheme.

Author information

Authors and Affiliations

Faculty of Engineering, School of Computing, Universiti Teknologi Malaysia, 81310, Johor Bahru, Malaysia
Safa Riyadh Waheed, Mohd Shafry Mohd Rahim & Norhaida Mohd Suaib
Computer Techniques Engineering Department, Faculty of information Technology, Imam Jaafar Al-sadiq University, Najaf, Iraq
Safa Riyadh Waheed
Laser Center and Physics Department, Faculty of Science, Universiti Teknologi Malaysia, Johor, Malaysia
A.A. Salim

Authors

Safa Riyadh Waheed
View author publications
You can also search for this author in PubMed Google Scholar
Mohd Shafry Mohd Rahim
View author publications
You can also search for this author in PubMed Google Scholar
Norhaida Mohd Suaib
View author publications
You can also search for this author in PubMed Google Scholar
A.A. Salim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A.A. Salim.

Ethics declarations

Conflict of interest

Please check the following as appropriate:

All authors have participated in (a) conception and design, or analysis and interpretation of the data; (b) drafting the article or revising it critically for important intellectual content; and (c) approval of the final version.

This manuscript has not been submitted to, nor is under review at, another journal or other publishing venue.

The authors have no affiliation with any organization with a direct or indirect financial interest in the subject matter discussed in the manuscript.

The following authors have affiliations with organizations with direct or indirect financial interest in the subject matter discussed in the manuscript:

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Waheed, S.R., Rahim, M.S.M., Suaib, N.M. et al. CNN deep learning-based image to vector depiction. Multimed Tools Appl 82, 20283–20302 (2023). https://doi.org/10.1007/s11042-023-14434-w

Download citation

Received: 30 November 2021
Revised: 22 November 2022
Accepted: 21 January 2023
Published: 31 January 2023
Issue Date: May 2023
DOI: https://doi.org/10.1007/s11042-023-14434-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CNN deep learning-based image to vector depiction