Assamese news image caption generation using attention mechanism

Das, Ringki; Singh, Thoudam Doren

doi:10.1007/s11042-022-12042-8

Assamese news image caption generation using attention mechanism

Published: 14 February 2022

Volume 81, pages 10051–10069, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Ringki Das¹ &
Thoudam Doren Singh¹

815 Accesses
19 Citations
1 Altmetric
Explore all metrics

Abstract

In recent times, neural networks and deep learning have made significant contributions in various research domains. In the present work, we report automatic caption generation of an image using these techniques. Automatic image caption generation is an artificial intelligence problem that receives attention from both computer vision and natural language processing researchers. Most of the caption generation tasks exist in the English language and no work has been reported yet in Assamese to the best of our knowledge. Assamese is an Indo-European language spoken by 14 million speakers in the North-East region of India. This paper reports the image caption generation on the Assamese news domain. A quality image captioning system requires an annotated training corpus. However, there is no such standard dataset available for this resource-constrained language. Therefore, we built a dataset of 13000 images collected from various online local Assamese e-newspapers. We employ two different architectures for generating the news image caption. The first model is based on CNN-LSTM and the second model is based on the attention mechanism. These models are evaluated both qualitatively and quantitatively. Qualitative analysis of the generated captions is carried out in terms of fluency and adequacy scores based on a standard rating scale. The quantitative result is evaluated using the BLEU and CIDEr evaluation metrics. We observe that the attention mechanism-based model outperforms the CNN-LSTM based model for our task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attention Is All You Need to Tell: Transformer-Based Image Captioning

Hybrid explainable image caption generation using image processing and natural language processing

Article 23 September 2024

Attention-Based Image Caption Generation

Notes

References

Amritkar C, Jabade V (2018) Image caption generation using deep learning technique. In: 2018 Fourth international conference on computing communication control and automation (ICCUBEA), IEEE, pp 1–4
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In Proc international conference on learning representations arXiv:1409.0473
Bai S, An S (2018) A survey on automatic image caption generation. Neurocomputing 311:291–304. Elsevier
Article Google Scholar
Batra V, He Y, Vogiatzis G (2018) Neural caption generation for news images. In: Proceedings of the Eleventh international conference on language resources and evaluation (LREC 2018)
Chen X, Fang H, Lin T-Y, Vedantam R, Gupta S, Dollár P, Lawrence ZC (2015) Microsoft coco captions: Data collection and evaluation server. arXiv:1504.00325
Chen X, Lawrence Zitnick C (2015) Mind’s eye: A recurrent visual representation for image caption generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2422–2431
Dhir R, Mishra SK, Saha S, Bhattacharyya P (2019) A deep attention based framework for image caption generation in hindi language. Computación y Sistemas 23:3
Article Google Scholar
Fang H, Gupta S, Iandola F, Srivastava RK, Deng L, Dollár P, Gao J, He X, Mitchell M, Platt JC et al (2015) From captions to visual concepts and back. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1473–1482
Feng Y, Lapata M (2010) How many words is a picture worth? automatic caption generation for news images. In: Proceedings of the 48th annual meeting of the Association for Computational Linguistics, pp 1239–1249
Feng Y, Lapata M (2012) Automatic caption generation for news images. IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (4):797–812. IEEE
Article Google Scholar
Gorokhovatskyi O, Peredrii O (2018) Shallow convolutional neural networks for pattern recognition problems. In: 2018 IEEE Second international conference on data stream mining & processing (DSMP), IEEE, pp 459–463
Haripriya B, Srushti GM, Haseeb S, Prakash MM Image Captioning using Deep Learning
Hochreiter S (1998) The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 6(02):107–116
Article Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Holzinger A, Saranti A, Mueller H (2021) KANDINSKY Patterns–An experimental exploration environment for Pattern Analysis and Machine Intelligence. arXiv:2103.00519
Kamal AH, Jishan Md, Mansoor N et al (2020) TextMage: The Automated Bangla Caption Generator Based On Deep Learning. arXiv:2010.08066
Karpathy A, Fei-Fei L (2015) Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3128–3137
Kohakade AK, Emmanuel M (2014) Content based caption generation for images embedded in news articles. Int J Comput Appl 100(11):7–15
Google Scholar
Lu X, Wang B, Zheng X, Li X (2017) . Exploring models and data for remote sensing image caption generation 56(4):2183–2195. IEEE
Google Scholar
Lu D, Whitehead S, Huang L, Ji H, Chang S-F (2018) Entity-aware image caption generation. arXiv:1804.07889
Luong M-T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. arXiv:1508.04025
Mansimov E, Parisotto E, Ba JL, Salakhutdinov R (2015) Generating images from captions with attention. arXiv:1511.02793
Meetei LS, Singh TD, Bandyopadhyay S (2019) Extraction and identification of manipuri and mizo texts from scene and document images. In: Deka B, Maji P, Mitra S, Bhattacharyya DK, Bora PK, Pal SK (eds) PReMI 2019. LNCS. https://doi.org/10.1007/978-3-030-34869-4_44, vol 11941. Springer, Cham, pp 405–414
Meetei LS, Singh TD, Bandyopadhyay S (2019) WAT2019: English-Hindi translation on Hindi visual genome dataset. In: Proceedings of the 6th workshop on asian translation, pp 181–188
Miyazaki T, Shimizu N (2016) Cross-lingual image caption generation. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 1780–1790
O’Shea K, Nash R (2015) An introduction to convolutional neural networks. arXiv:1511.08458
Papineni K, Roukos S, Ward T, Zhu W-J (2002) BLEU: A method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp 311–318
Peng H, Li N (2016) Generating chinese captions for flickr30k images
Prajapati K, Wadekar S, Bobhate B, Mhatre A Auto-Caption Generation for News Images
Rahman M, Mohammed N, Mansoor N, Momen S (2019) Chittron: An automatic bangla image captioning system. Procedia Comput Sci 154:636–642. Elsevier
Article Google Scholar
Sherstinsky A (2020) Fundamentals of recurrent neural network (rnn) and long short-term memory (lstm) network. Physica D: Nonlinear Phenomena 404:132306. Elsevier
Article MathSciNet Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Singh A, Meetei LS, Singh TD, Bandyopadhyay S (2021) Generation and evaluation of hindi image captions of visual genome. In: Maji AK, Saha G, Das S, Basu S, Tavares JMRS (eds) Proceedings of the international conference on computing and communication systems. Lecture Notes in Networks and Systems. https://doi.org/10.1007/978-981-33-4084-8_7, vol 170. Springer, Singapore
Soh M (2016) Learning CNN-LSTM architectures for image caption generation. Dept Comput Sci, Stanford Univ., Stanford, CA, USA, Tech. Rep
Vedantam R, Lawrence Zitnick C, Parikh D (2015) Cider: Consensus-based image description evaluation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4566–4575
Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: A neural image caption generator, pp 3156–3164
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057
You Q, Jin H, Wang Z, Fang C, Luo J (2016) Image captioning with semantic attention. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4651–4659

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, National Institute of Technology, Silchar, India
Ringki Das & Thoudam Doren Singh

Authors

Ringki Das
View author publications
You can also search for this author inPubMed Google Scholar
Thoudam Doren Singh
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Ringki Das.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Das, R., Singh, T.D. Assamese news image caption generation using attention mechanism. Multimed Tools Appl 81, 10051–10069 (2022). https://doi.org/10.1007/s11042-022-12042-8

Download citation

Received: 06 January 2021
Revised: 04 June 2021
Accepted: 04 January 2022
Published: 14 February 2022
Issue Date: March 2022
DOI: https://doi.org/10.1007/s11042-022-12042-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Assamese news image caption generation using attention mechanism

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Attention Is All You Need to Tell: Transformer-Based Image Captioning

Hybrid explainable image caption generation using image processing and natural language processing

Attention-Based Image Caption Generation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now