A novel automatic image caption generation using bidirectional long-short term memory framework

Ye, Zhongfu; Khan, Rashid; Naqvi, Nuzhat; Islam, M. Shujah

doi:10.1007/s11042-021-10632-6

A novel automatic image caption generation using bidirectional long-short term memory framework

Published: 19 April 2021

Volume 80, pages 25557–25582, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Zhongfu Ye¹,
Rashid Khan¹,
Nuzhat Naqvi¹ &
…
M. Shujah Islam¹

587 Accesses
11 Citations
Explore all metrics

Abstract

Image Captioning, the process of generating a textual description of an image, has emerged as a hot research due to its practical importance in many domains. It is a challenging task as it uses both Natural Language Processing and Computer Vision related fields to generate the captions. Despite the fact that the literature has reported notable image captioning methodologies, they still lag in accomplishing the substantial performance level for diverse datasets. This paper proposes an image caption generating mechanism based on Optimized Bidirectional Long Short-Term Memory (B-LSTM) model. We propose a variant of Moth Flame Optimization (PMFO), termed here as Proposed Moth Flame Optimization (PMFO), which has logarithmic spiral update based on correlation. The performance of the proposed model is demonstrated on benchmark datasets like Flicker 8 k, Flicker30k, VizWik and COCO datasets using renowned metrics such as CIDEr, BLEU, SPICE and ROUGH. The performance analysis proves that the B-LSTM achieves better performance on caption generation than state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Artificial intelligence in the creative industries: a review

Article Open access 02 July 2021

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Article Open access 06 February 2017

A Systematic Survey on CAPTCHA Recognition: Types, Creation and Breaking Techniques

Article 14 June 2021

Abbreviations

LSTM:: Long Short Term Memory
B-LSTM:: Bidirectional Long Short Term Memory
PMFO:: Proposed Moth Flame Optimization
AI:: Artificial Intelligence
NLP:: Natural Language Processing
RNN:: Recurrent Neural Network
CNN:: Convolutional Neural Network
NN:: Neural Network
SGC:: Scene Graph Captioner
TA-LSTM:: Triple Attention LSTM
VD-SAN:: Visual-Densely Semantic Attention Network
DenseNet:: Dense Convolutional Network
gLSTM:: guidance LSTM
PIL:: Python Image Library
c-RNN:: character-level RNN
(NLP):: Natural language Processing

References

Amritkar C, Jabade V (2018) Image caption generation using deep learning technique. IEEE 978–1–5386-5257-2/18/$31.00
Anuranji R, Srimathi H (2020) A supervised deep convolutional based bidirectional long short term memory video hashing for large scale video retrieval applications. Digital Signal Process 4(1):102729
Article Google Scholar
Campi A, Guinea S, Spoletini P (2014) An operational semantics for XML fuzzy queries. eval (q, Ti) 1: 1
Chandanapalli SB, Sreenivasa Reddy E, Rajya Lakshmi D (2019) Convolutional neural network for water quality prediction in WSN. J Network Commun Syst 2(3):40–47
Google Scholar
Chen X, Zhang M, Wang Z, Zuo L, Li B, Yang Y (2020) Leveraging unpaired out-of-domain data for image captioning. Pattern Recogn Lett 132:132–140
Article Google Scholar
Christie G, Laddha A, Agrawal A, Antol S, Goyal Y, Kochersberger K, Batra D (2017) Resolving vision and language ambiguities together: Joint Segmentation & Prepositional Attachment Resolution in captioned scenes. Comput Vis Image Underst 163:101–112
Article Google Scholar
Fan C, Zhang Z, Crandall DJ (2018) Deepdiary: Lifelogging image captioning and summarization. J Vis Commun Image Represent 55:40–55
Article Google Scholar
Feng Y, Lapata M (2012) Automatic caption generation for news images. IEEE Trans Pattern Anal Mach Intell 35(4):797–812
Article Google Scholar
George A, Rajakumar BR (2013) APOGA: An Adaptive Population Pool Size based Genetic Algorithm. AASRI Procedia - 2013 AASRI Conference on Intelligent Systems and Control,4, pp 288–296.
Guan J, Wang E (2018) Repeated review based image captioning for image evidence review. Signal Process Image Commun 63:141–148
Article Google Scholar
He X, Yang Y, Shi B, Bai X (2019) Vd-san: visual-densely semantic attention network for image caption generation. Neurocomputing 328:48–55
Article Google Scholar
He X, Shi B, Bai X, Xia G-S, Zhang Z, Dong W (2019) Image caption generation with part of speech guidance. Pattern Recogn Lett 119:229–237
Article Google Scholar
Huang G, Hu H (2018) C-Rnn: a fine-grained language model for image captioning. Neural Process Lett 49(2):683–691
Article Google Scholar
Jamieson M, Eskin Y, Fazly A, Stevenson S, Dickinson SJ (2012) Discovering hierarchical object models from captioned images. Comput Vis Image Underst 116(7):842–853
Article Google Scholar
Ji Q, Huang J, He W, Sun Y (2019) 'Optimized Deep Convolutional Neural Networks for Identification of Macular Diseases from Optical Coherence Tomography Images. Algorithms 12(3):51
Article MathSciNet Google Scholar
Kahn CE, Rubin DL (2009) Automated semantic indexing of figure captions to improve radiology image retrieval. J Am Med Inform Assoc 16(3):380–386
Article Google Scholar
Karpathy A, Joulin A, Fei-Fei LF (2014) Deep fragment embeddings for bidirectional image sentence mapping. In advances in neural information processing systems (pp. 1889-1897)
Kinghorn P, Zhang L, Shao L (2018) A region-based image caption generator with refined descriptions. Neurocomputing 272:416–424
Article Google Scholar
Liu Q, Chen Y, Wang J, Zhang S (2018) Multi-view pedestrian captioning with an attention topic Cnn model. Comput Ind 97:47–53
Article Google Scholar
Liu M, Li L, Hu H, Guan W, Tian J (2020) Image Caption Generation with Dual Attention Mechanism. Inf Process Manag 57(2):102178
Article Google Scholar
Lu X, Wang B, Zheng X, Li X (2017) Exploring models and data for remote sensing image caption generation. IEEE Trans Geosci Remote Sens 56(4):2183–2195
Article Google Scholar
Manti S, Parisi GF, Giacchi V, Sciacca P, Tardino L, Cuppari C, Salpietro C, Chikermane A, Leonardi S (2019) Pilot study shows right ventricular diastolic function impairment in young children with obstructive respiratory disease. Acta Paediatr 108(4):740–744
Article Google Scholar
Mirjalili S (2015) Moth-flame optimization algorithm: a novel nature-inspired heuristic paradigm. Knowl-Based Syst 89:228–249
Article Google Scholar
Nabati M, Behrad A (2020) Video captioning using boosted and parallel Long Short-Term Memory networks. Comput Vis Image Understand 1(190):102840
Article Google Scholar
Parisi GF, Herman T, van Meel ER, Ciet P, Kemner-van de Corput MP, Reiss IK, Jaddoe VWV, de Jongste JC, Tiddens HAWM, Duijts L (2017) Influence of early growth on childhood lung function assessed by magnetic resolution imaging and spirometry. The Generation R Study
Poluru RK, Lokesh Kumar R (2019) Enhancement of ATC by optimizing TCSC configuration using adaptive moth flame optimization algorithm. J Computation Mech Power Syst Control 2(3):1–9
Article Google Scholar
Rajakumar BR (2013) Static and adaptive mutation techniques for genetic algorithm: a systematic comparative analysis. Int J Comput Sci Eng 8(2):180–193
Rajakumar BR (2013) Impact of static and adaptive mutation techniques on the performance of genetic algorithm. In J Hybrid Intell Syst 10(1):11–22
Google Scholar
Rajakumar BR, George A (2012) A New Adaptive Mutation Technique for Genetic Algorithm. In: proceedings of IEEE International Conference on Computational Intelligence and Computing Research (ICCIC) pp1–7
Shetty R, Tavakoli HR, Laaksonen J (2018) Image and video captioning with augmented neural architectures. IEEE MultiMedia 25(2):34–46
Article Google Scholar
Swamy SM, Rajakumar BR, Valarmathi IR (2013) Design of Hybrid Wind and Photovoltaic Power System using Opposition-based Genetic Algorithm with Cauchy Mutation. IET Chennai Fourth International Conference on Sustainable Energy and Intelligent Systems, pp 504–510
Tan YH, Chan CS (2019) Phrase-based image caption generator with hierarchical Lstm network. Neurocomputing 333:86–100
Article Google Scholar
Wu C, Wei Y, Chu X, Su F, Wang L (2018) Modeling visual and word-conditional semantic attention for image captioning. Signal Process Image Commun 67:100–107
Article Google Scholar
Wu Q, Shen C, Wang P, Dick A, van den Hengel A (2018) Image captioning and visual question answering based on attributes and external knowledge. IEEE Trans Pattern Anal Mach Intell 40(6):1367–1381
Article Google Scholar
Xu N, Liu A-A, Liu J, Nie W, Su Y (2019) Scene graph Captioner: image captioning based on structural visual representation. J Vis Commun Image Represent 58:477–485
Article Google Scholar
Yuan A, Li X, Lu X (2019) 3g structure for image caption generation. Neurocomputing 330:17–28
Article Google Scholar
Zhao D, Chang Z, Guo S (2019) A multimodal fusion approach for image captioning. Neurocomputing 329:476–485
Article Google Scholar
Zheng H, Wu J, Liang R, Li Y, Li X (2018) Multi-task learning for captioning images with novel words. IET Comput Vis 13(3):294–301
Article Google Scholar
Zhou X, Lin J, Zhang Z, Shao Z, Chen S, Liu H (2020) Improved Itracker combined with bidirectional long short-term memory for 3d gaze estimation using appearance cues. Neurocomputing 390:217–225
Article Google Scholar
Zhu X, Li L, Liu J, Li Z, Peng H, Niu X (2018) Image captioning with triple-attention and stack parallel Lstm. Neurocomputing 319:55–65
Article Google Scholar

Download references

Acknowledgments

This research is supported by the Fundamental Research Funds for the Central Universities (Grant no. WK2350000002).

Author information

Authors and Affiliations

University of Science and Technology of China, Hefei, Anhui, China
Zhongfu Ye, Rashid Khan, Nuzhat Naqvi & M. Shujah Islam

Authors

Zhongfu Ye
View author publications
You can also search for this author in PubMed Google Scholar
Rashid Khan
View author publications
You can also search for this author in PubMed Google Scholar
Nuzhat Naqvi
View author publications
You can also search for this author in PubMed Google Scholar
M. Shujah Islam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhongfu Ye.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ye, Z., Khan, R., Naqvi, N. et al. A novel automatic image caption generation using bidirectional long-short term memory framework. Multimed Tools Appl 80, 25557–25582 (2021). https://doi.org/10.1007/s11042-021-10632-6

Download citation

Received: 28 July 2020
Revised: 13 November 2020
Accepted: 04 February 2021
Published: 19 April 2021
Issue Date: July 2021
DOI: https://doi.org/10.1007/s11042-021-10632-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel automatic image caption generation using bidirectional long-short term memory framework

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence in the creative industries: a review

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

A Systematic Survey on CAPTCHA Recognition: Types, Creation and Breaking Techniques

Abbreviations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel automatic image caption generation using bidirectional long-short term memory framework

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence in the creative industries: a review

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

A Systematic Survey on CAPTCHA Recognition: Types, Creation and Breaking Techniques

Abbreviations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation