Abstract
In the computational science and engineering domains, the depiction of picture information remains an intricate problem. Such a description needs an accurate recognition of various objects and individuals together with their attributes, correlations, and panorama information. Based on this fact, we depict the image contents in the natural language or image description generation methods using the convolutional neural networks (CNNs)-assisted deep learning (CNN-DL) approach, wherein the images are transformed to vectors. The DL and study attributes via the machine-learned data were used to construct the complete pictures from the real world. Two sections were considered based on image classification for CNN’s improvement method to develop a classification model and the good results of the classification via a novel method for describing an image to the vector of each object in the image. The learning and relationship activity included all the essential categorizing and classifying entities. In addition, the developed system was extended to handle the open detection and hazards classification. The performance evaluation (using the CIFAR dataset) of the newly developed system revealed its better strength and flexibility in managing the test images from a new-fangled and isolated field than the reported techniques.
Graphical abstract
Similar content being viewed by others
References
Adnan MM, Rahim MSM, Rehman A, Mehmood Z, Saba T, Naqvi RA (2021) Automatic image annotation based on deep learning models: a systematic review and future challenges. IEEE Access 9:50253–50264
Anderson P, He X, Buehler C, Teney D, Johnson M, Gould S, Zhang L (2018) Bottom-up and top-down attention for image captioning and visual question answering. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6077-6086)
Ayadi W, Elhamzi W, Charfi I, Atri M (2021) Deep CNN for brain tumor classification. Neural Process Lett 53(1):671–700
Banerjee S, Lavie A (2005) METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization (pp. 65-72)
Benyahia S, Meftah B, Lézoray O (2022) Multi-features extraction based on deep learning for skin lesion classification. Tissue Cell 74:101701
Bianchini M, Scarselli F (2014) On the complexity of neural network classifiers: A comparison between shallow and deep architectures. IEEE TransacNeural Netw Learn Syst 25(8):1553–1565
Bullins B, Hazan E, Kalai A, Livni R (2019) Generalize across tasks: efficient algorithms for linear representation learning. In algorithmic learning theory (pp. 235-246). PMLR
Chen X., Lawrence Zitnick C (2015) Mind's eye: A recurrent visual representation for image caption generation. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2422-2431)
Chen Y, Liu L, Tao J, Chen X, Xia R, Zhang Q, Xie J (2021) The image annotation algorithm using convolutional features from intermediate layer of deep learning. Multimed Tools Appl 80(3):4237–4261
Chun PJ, Yamane T, Maemura Y (2022) A deep learning-based image captioning method to automatically generate comprehensive explanations of bridge damage. Comput-Aided Civil Infrastruc Eng 37(11):1387–1401
Dahl GE, Yu D, Deng L, Acero A (2011) Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans Audio Speech Lang Process 20(1):30–42
Deng L, Yu D (2014) Deep learning: methods and applications. Foundations Trends® Sig Proc 7(3–4):197–387
Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2625-2634)
El-Komy A, Shahin OR, Abd El-Aziz RM, Taloba AI (2022) Integration of computer vision and natural language processing in multimedia robotics application. Inform Sci Lett 11(3):9
Esteva A, Chou K, Yeung S, Naik N, Madani A, Mottaghi A, Socher R (2021) Deep learning-enabled medical computer vision. NPJ Digital Med 4(1):1–9
Gan Z, Gan C, He X, Pu Y, Tran K, Gao J, Deng L (2017) Semantic compositional networks for visual captioning. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5630-5639)
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT press
He X, Deng L (2017) Deep learning for image-to-text generation: A technical overview. IEEE Signal Process Mag 34(6):109–116
He X, Deng L (2018) Deep learning in natural language generation from images. In deep learning in natural language processing (pp. 289–307). Springer, Singapore
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778)
Hinton G, Deng L, Yu D, Dahl GE, Mohamed AR, Jaitly N, Kingsbury B (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Proc Mag 29(6):82–97
Hodosh M, Young P, Hockenmaier J (2013) Framing image description as a ranking task: data, models and evaluation metrics. J Artif Intell Res 47:853–899
Idicula SM (2019) Dense model for automatic image description generation with game theoretic optimization. Information 10(11):354
Jena B, Saxena S, Nayak GK, Saba L, Sharma N, Suri JS (2021) Artificial intelligence-based hybrid deep learning models for image classification: the first narrative review. Comput Biol Med 137:104803
Jia X, Gavves E, Fernando B, Tuytelaars T (2015) Guiding the long-short term memory model for image caption generation. In proceedings of the IEEE international conference on computer vision (pp. 2407-2415)
Kadhim KA, Adnan MM, Waheed SR, Alkhayyat A (2021) Automated high-security license plate recognition system. Materials Today: Proceedings, WITHDRAWN: Automated high-security license plate recognition system
Kiros R, Salakhutdinov R, Zemel R (2014) Multimodal neural language models. In international conference on machine learning (pp. 595-603). PMLR
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Kulkarni G, Premraj V, Ordonez V, Dhar S, Li S, Choi Y, Berg TL (2013) Babytalk: understanding and generating simple image descriptions. IEEE Trans Pattern Anal Mach Intell 35(12):2891–2903
Li S, Kulkarni G, Berg T, Berg A, Choi Y (2011) Composing simple image descriptions using web-scale n-grams. In proceedings of the fifteenth conference on computational natural language learning (pp. 220-228)
Li S, Xiao T, Li H, Zhou B, Yue D, Wang X (2017) Person search with natural language description. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1970-1979)
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Zitnick CL (2014) Microsoft coco: common objects in context. In European conference on computer vision (pp. 740–755). Springer, Cham
Lin K, Li D, He X, Zhang Z, Sun MT (2017) Adversarial ranking for language generation. Adv Neural Inf Proces Syst 30
Liu Y, An X (2017) A classification model for the prostate cancer based on deep learning. In 2017 10th international congress on image and signal processing, BioMedical engineering and informatics (CISP-BMEI) (pp. 1-6). IEEE
Liu C, Mao J, Sha F, Yuille A (2017) Attention correctness in neural image captioning. In Thirty-first AAAI conference on artificial intelligence, Attention Correctness in Neural Image Captioning
Mao J, Xu W, Yang Y, Wang J, Huang Z, Yuille A (2014) Deep captioning with multimodal recurrent neural networks (m-rnn). arXiv preprint arXiv:1412.6632
McCulloch WS, Pitts W (1943) A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys 5(4):115–133
Najjar FH, Al-Jawahry HM, Al-Khaffaf MS, Al-Hasani AT (2021) A novel hybrid feature extraction method using LTP, TFCM, and GLCM. In journal of physics: conference series (Vol. 1892, no. 1, p. 012018). IOP publishing
O'Connor P, Neil D, Liu SC, Delbruck T, Pfeiffer M (2013) Real-time classification and sensor fusion with a spiking deep belief network. Front Neurosci 7:178
Ordonez V, Kulkarni G, Berg T (2011) Im2text: describing images using 1 million captioned photographs. Advances in neural information processing systems, 24
Piasco N, Sidibé D, Gouet-Brunet V, Demonceaux C (2021) Improving image description with auxiliary modality for visual localization in challenging conditions. Int J Comput Vis 129(1):185–202
Qin J, Pan W, Xiang X, Tan Y, Hou G (2020) A biological image classification method based on improved CNN. Ecolog Inform 58:101093
Rashtchian C, Young P, Hodosh M, Hockenmaier J (2010) Collecting image annotations using amazon’s mechanical turk. In proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon’s mechanical Turk (pp. 139-147)
Shao H, Lin J, Zhang L, Galar D, Kumar U (2021) A novel approach of multisensory fusion to collaborative fault diagnosis in maintenance. Inform Fusion 74:65–76
Sharma H, Jalal AS (2022) Image captioning improved visual question answering. Multimed Tools Appl 81(24):34775–34796
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Sun Y, Xue B, Zhang M, Yen GG, Lv J (2020) Automatically designing CNN architectures using the genetic algorithm for image classification. IEEE Transac Cybernet 50(9):3840–3854
Venugopalan S, Rohrbach M, Donahue J, Mooney R, Darrell T, Saenko K (2015) Sequence to sequence-video to text. In proceedings of the IEEE international conference on computer vision (pp. 4534-4542)
Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: A neural image caption generator. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3156-3164)
Waheed SR, Alkawaz MH, Rehman A, Almazyad AS, Saba T (2016) Multifocus watermarking approach based on discrete cosine transform. Microsc Res Tech 79(5):431–437
Waheed SR, Suaib NM, Rahim MSM, Adnan MM, Salim AA (2021) Deep learning algorithms-based object detection and localization revisited. In journal of physics: conference series (Vol. 1892, no. 1, p. 012001). IOP publishing
Wang H, Meghawat A, Morency LP, Xing EP (2016) Select-additive learning: improving cross-individual generalization in multimodal sentiment analysis. arXiv preprint arXiv:1609.05244
Wu FX, Li M (2019) Deep learning for biological/clinical data. Neurocomputing 324:1–2
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In international conference on machine learning (pp. 2048-2057). PMLR
Xu S, Wang J, Shou W, Ngo T, Sadick AM, Wang X (2021) Computer vision techniques in construction: a critical review. Arch Computa Meth Eng 28(5):3383–3397
Yang Z, Yuan Y, Wu Y, Cohen WW, Salakhutdinov RR (2016) Review networks for caption generation. Adv Neural Inf Proces Syst 29
Yao K, Peng B, Zhang Y, Yu D, Zweig G, Shi Y (2014) Spoken language understanding using long short-term memory neural networks. In 2014 IEEE spoken language technology workshop (SLT) (pp. 189-194). IEEE
Young P, Lai A, Hodosh M, Hockenmaier J (2014) From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions. Transac Assoc Comput Linguist 2:67–78
Yu L, Zhang W, Wang J, Yu Y (2017) Seqgan: sequence generative adversarial nets with policy gradient. In Proceedings of the AAAI conference on artificial intelligence, 31 (1)
Acknowledgments
Authors are extremely thankful to Universiti Teknologi Malaysia (UTM), Ministry of Higher Education Malaysia (MOHE), and RMC for research grant FRGS Q.J130000.2509.21H11, and UTM RA ICONIC GRANT Q.J130000.4354.09G60, FRGS-04E86 and UTMFR 21H78. Authors are also grateful to Research Management Centre-Universiti Teknologi Malaysia (RMC-UTM) for supporting under Postdoctoral fellowship scheme.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Please check the following as appropriate:
All authors have participated in (a) conception and design, or analysis and interpretation of the data; (b) drafting the article or revising it critically for important intellectual content; and (c) approval of the final version.
This manuscript has not been submitted to, nor is under review at, another journal or other publishing venue.
The authors have no affiliation with any organization with a direct or indirect financial interest in the subject matter discussed in the manuscript.
The following authors have affiliations with organizations with direct or indirect financial interest in the subject matter discussed in the manuscript:
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Waheed, S.R., Rahim, M.S.M., Suaib, N.M. et al. CNN deep learning-based image to vector depiction. Multimed Tools Appl 82, 20283–20302 (2023). https://doi.org/10.1007/s11042-023-14434-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-14434-w