Deep neural combinational model (DNCM): digital image descriptor for child’s independent learning

Naqvi, Nuzhat; Islam, M. Shujah; Iqbal, Mansoor; Kanwal, Shamsa; Khan, Asad; Ye, ZhongFu

doi:10.1007/s11042-022-12291-7

Deep neural combinational model (DNCM): digital image descriptor for child’s independent learning

Published: 05 April 2022

Volume 81, pages 29955–29975, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Nuzhat Naqvi ORCID: orcid.org/0000-0001-5265-6026¹,
M. Shujah Islam¹,
Mansoor Iqbal¹,
Shamsa Kanwal¹,
Asad Khan² &
…
ZhongFu Ye¹

256 Accesses
1 Altmetric
Explore all metrics

Abstract

This project is an endeavor to address preschool children’s independent learning. Currently, technology is invading our lives, and working parents are busily overpowering their social setup. As a result, the art of preschool education to young children has become rare or vanished. Automatic image descriptors (captioning models) have recently shown their effectiveness, motivating us to utilize such models for address purposes. Unfortunately, developed image descriptors produce only complex and generic visual descriptions irrelevant to children’s understanding. Therefore, it is important to have a suitable image descriptor as teaching material for young children at the initial educational stage. To fill this gap, we introduced a novel digital image descriptor and 3k-Flickr-SDD dataset using smart augmentation that originally extracted and labeled solitary dogs’ images from Flickr8k and Stanford Dogs Dataset (SDD) datasets. The newly developed 3k-Flickr-SDD dataset split further into two versions, making it meet the standard experimental requirements. The proposed method accumulates Convolutional Neural Networks (CNNs) for image contents extraction, whereas; Long Short-term Memory (LSTM) language model customizes to generate understandable and attractive text from Dogs images. We performed the quantitative and qualitative analysis; the finding reveals that the proposed model outperforms in contrast to existing models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Experimental Evaluation of LLM on Image Classification

Generating Descriptive Captions for Images Using CNN and RNN

From Bytes to Bites: Revolutionizing Culinary Creativity Through Deep Learning and AI-Generated Recipes

References

Barnett WS (1992) Benefits of compensatory preschool education. J Hum Resour 27:279–312
Article Google Scholar
Callison-Burch C, Osborne M, Koehn P (2006) Re-evaluation of the role of bleu in machine translation research. In: 11th Conference of the European Chapter of the Association for Computational Linguistics
Chang YS (2018) Fine-grained attention for image caption generation. Multimed Tools Appl 77:2959–2971
Article Google Scholar
Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, Chua TS (2017) Sca-CNN: spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 5659–5667
Chen J, Dong W, Li M Image caption generator based on deep neural networks
Cheng Q, Zhang Q, Fu P, Tu C, Li S (2018) A survey and analysis on automatic image annotation. Pattern Recogn 79:242–259
Article Google Scholar
Cui Y, Yang G, Veit A, Huang X, Belongie S (2018) Learning to evaluate image captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 5804–5812
Degadwala S, Vyas D, Biswas H, Chakraborty U, Saha S (2021) Image captioning using inception V3 transfer learning model. In: 2021 6th International Conference on Communication and Electronics Systems (ICCES). IEEE, pp 1103–1108
Denoual E, Lepage Y (2005) BLEU in characters: towards automatic MT evaluation in languages without word delimiters. In: Companion Volume to the Proceedings of Conference including Posters/Demos and Tutorial Abstracts
Donahue J, Hendricks LA, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2625–2634
Farhadi A, Hejrati M, Sadeghi MA, Young P, Rashtchian C, Hockenmaier J, Forsyth D (2010) Every picture tells a story: generating sentences from images. In: European conference on computer vision. Springer, Berlin, pp 15–29
Google Scholar
Fu K, Jin J, Cui R, Sha F, Zhang C (2017) Aligning where to see and what to tell: image captioning with region-based attention and scene-specific contexts. IEEE Trans Pattern Anal Mach Intell 39(12):2321–2334
Article Google Scholar
Gong Y, Wang L, Hodosh M, Hockenmaier J, Lazebnik S (2014) Improving image-sentence embeddings using large, weakly annotated photo collections. In: European conference on computer vision. Springer, Cham, pp 529–545
Google Scholar
Gupta N, Jalal AS (2020) Integration of textual cues for fine-grained image captioning using deep CNN and LSTM. Neural Comput & Applic 32(24):17899–17908
Article Google Scholar
Hibbin R (2016) The psychosocial benefits of oral storytelling in school: developing identity and empathy through narrative. Pastor Care Educ 34(4):218–231
Article Google Scholar
Hodosh M, Young P, Hockenmaier J (2013) Framing image description as a ranking task: data, models, and evaluation metrics. J Artif Intell Res 47:853–899
Article MathSciNet Google Scholar
Hossain M, Sohel F, Shiratuddin MF, Laga H (2018) A comprehensive study of deep learning for image captioning. arXiv preprint arXiv:1810.04020
Jent JF, Niec LN, Baker SE (2011) Play and interpersonal processes, Play in clinical practice: evidence-based approaches. Guilford Press, New York
Google Scholar
Karpathy A, Fei-Fei L (2015). Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 3128–3137
Khan MF, Sadiq-Ur-Rahman SM, Islam MS (2021) Improved Bengali image captioning via deep convolutional neural network based encoder-decoder model. In: Proceedings of International Joint Conference on Advances in Computational Intelligence. Springer, Singapore, pp 217–229
Chapter Google Scholar
Khosla A, Jayadevaprakash N, Yao B, Li FF (2011) Novel dataset for fine-grained image categorization: Stanford dogs. In: Proc. CVPR Workshop on Fine-Grained Visual Categorization (FGVC), vol. 2, no. 1
Kinghorn P, Zhang L, Shao L (2018) A region-based image caption generator with refined descriptions. Neurocomputing 272:416–424
Article Google Scholar
Kiros R, Salakhutdinov R, Zemel RS (2014) Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv:1411.2539
Kuznetsova P, Ordonez V, Berg TL, Choi Y (2014) Treetalk: composition and compression of trees for image descriptions. Trans Assoc Computat Linguist 2:351–362
Article Google Scholar
Lemley J, Bazrafkan S, Corcoran P (2017) Smart augmentation learning an optimal data augmentation strategy. IEEE Access 5:5858–5869
Article Google Scholar
Li L, Tang S, Zhang Y, Deng L, Tian Q (2018) GLA: global-local attention for image description. IEEE Trans Multimed 20:726–737
Article Google Scholar
Lin CY (2004) Rouge: a package for automatic evaluation of summaries. Text Summarization Branches Out
Mao J, Xu W, Yang Y, Wang J, Huang Z, Yuille A (2014) Deep captioning with multimodal recurrent neural networks (m-rnn). arXiv preprint arXiv:1412.6632
Naqvi N, Ye Z (2020) Image captions: global-local and joint signals attention model (GL-JSAM). Multimed Tools Appl 79:24429–24448. https://doi.org/10.1007/s11042-020-09128-6
Article Google Scholar
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 311–318
Perry BD, Szalavitz M (2010) Born for love: why empathy is essential—and endangered. HarperCollins e-Books
Minoofam SAH, Bastanfard A, Keyvanpour MR (2021) TRCLA: a transfer learning approach to reduce negative transfer for cellular learning automata. In: IEEE transactions on neural networks and learning systems. IEEE. https://doi.org/10.1109/TNNLS.2021.3106705
Shah P, Bakrola V, Pati S (2017) Image captioning using deep neural architectures. In: 2017 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS). IEEE, pp 1–4
Soh M (2016) Learning CNN-LSTM architectures for image caption generation. Dept. Comput. Sci., Stanford Univ., Stanford, CA, USA, Tech. Rep
Sun C, Gan C, Nevatia R (2015) Automatic concept discovery from parallel text and visual corpora. In: Proceedings of the IEEE international conference on computer vision. pp 2596–2604
Venter E (2017) Bridging the communication gap between Generation Y and the Baby Boomer generation. Int J Adolesc Youth 22(4):497–507. https://doi.org/10.1080/02673843.2016.1267022
Article MathSciNet Google Scholar
Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 3156–3164
Wang C, Yang H, Meinel C (2018) Image captioning with deep bidirectional lstms and multi-task learning. ACM Trans Multimed Comput Commun Appl 14(2s):40
Google Scholar
Warin J (2011) Stories of self: tracking children's identity and wellbeing through the years of school. Educ Health 29(1):19–20
MathSciNet Google Scholar
Wu Q, Shen C, Wang P, Dick A, van den Hengel A (2018) Image captioning and visual question answering based on attributes and external knowledge. IEEE Trans Pattern Anal Mach Intell 40(6):1367–1381
Article Google Scholar
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel RS, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: International conference on machine learning. pp 2048–2057
Yao T, Pan Y, Li Y, Qiu Z, Mei T (2017) Boosting image captioning with attributes. In Proceedings of the IEEE International Conference on Computer Vision. pp 4894–4902
Ye Z, Khan R, Naqvi N, Islam MS (2021) A novel automatic image caption generation using bidirectional long-short term memory framework. Multimed Tools Appl 80:25557–25582. https://doi.org/10.1007/s11042-021-10632-6
Article Google Scholar
Yu F, Ip HH (2006) Automatic semantic annotation of images using spatial hidden Markov model. In: 2006 IEEE International Conference on Multimedia and Expo. IEEE pp 305–308
Zhao D, Chang Z, Guo S (2019) A multimodal fusion approach for image captioning. Neurocomputing 329:476–485
Article Google Scholar

Download references

Acknowledgments

This work is supported by China’s National Natural Science Foundation (No. 61671418) and the Advanced Research Fund of the University of Science and Technology of China. Conflicts of Interest: The authors declare no conflict of interest.

Author information

Authors and Affiliations

University of Science and Technology of China (USTC), Hefei, People’s Republic of China
Nuzhat Naqvi, M. Shujah Islam, Mansoor Iqbal, Shamsa Kanwal & ZhongFu Ye
School of Computer Science, South China Normal University, Guangzhou, People’s Republic of China
Asad Khan

Authors

Nuzhat Naqvi
View author publications
You can also search for this author inPubMed Google Scholar
M. Shujah Islam
View author publications
You can also search for this author inPubMed Google Scholar
Mansoor Iqbal
View author publications
You can also search for this author inPubMed Google Scholar
Shamsa Kanwal
View author publications
You can also search for this author inPubMed Google Scholar
Asad Khan
View author publications
You can also search for this author inPubMed Google Scholar
ZhongFu Ye
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Nuzhat Naqvi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Naqvi, N., Islam, M.S., Iqbal, M. et al. Deep neural combinational model (DNCM): digital image descriptor for child’s independent learning. Multimed Tools Appl 81, 29955–29975 (2022). https://doi.org/10.1007/s11042-022-12291-7

Download citation

Received: 21 January 2021
Revised: 06 January 2022
Accepted: 14 January 2022
Published: 05 April 2022
Issue Date: September 2022
DOI: https://doi.org/10.1007/s11042-022-12291-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep neural combinational model (DNCM): digital image descriptor for child’s independent learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Experimental Evaluation of LLM on Image Classification

Generating Descriptive Captions for Images Using CNN and RNN

From Bytes to Bites: Revolutionizing Culinary Creativity Through Deep Learning and AI-Generated Recipes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now