Generation of Image Caption Using CNN-LSTM Based Approach

Aravindkumar, S.; Varalakshmi, P.; Hemalatha, M.

doi:10.1007/978-3-030-16657-1_43

S. Aravindkumar¹⁸,
P. Varalakshmi¹⁹ &
M. Hemalatha²⁰

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 940))

Included in the following conference series:

International Conference on Intelligent Systems Design and Applications

1675 Accesses

Abstract

Image captioning is gaining attention due to the recent developments in the deep neural architectures. But the gap between semantic concepts and the visual features is a major challenge in image caption generation. In this paper we have developed a method to use both visual features and semantic features for the caption generation. We discuss briefly about the various architectures used for visual feature extraction and Long Short Term Memory (LSTM) for caption generation. An object recognition model has been developed to identify the semantic tags in the images. These tags are encoded along with the visual features for the captioning task. We have developed an Encoder-Decoder architecture using the semantic details along with the language model for the caption generation. We evaluated our model with standard datasets like Flickr8k, Flickr30k and MSCOCO using standard metrics like BLEU and METEOR.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An Efficient Image Caption Generation Using GoogLeNet

Computational Intelligence for Image Caption Generation

Automatic image caption generation using deep learning

Article 01 June 2023

References

Jaing, W., Ma, L., Chen, X., Zhang, H., Liu, W.: Learning to guide decoding for image captioning. In: Thirty Second AAAI Conference on Artificial Intelligence (AAAI – 2018), pp. 6959–6966 (2018)
Google Scholar
Kinghorn, P., Zhang, L., Shao, L.: A hierarchical and regional deep learning architecture for image description generation. Pattern Recogin. Lett. 119, 1–9 (2017)
Google Scholar
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Computer Vision–ECCV 2014, pp. 740–755 (2014)
Google Scholar
Tariq, A., Foroosh, H.: A context - driven extractive framework for generating realistic image descriptions. IEEE Trans. Image Process. 26(2), 619–632 (2017)
Article MathSciNet Google Scholar
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation (2014)
Google Scholar
Plummer, B.A., Wang, L., Cervantes, C.M., Caicedo, J.C., Hockenmaier, J., Lazebnik, S.: Flickr30k entities: collecting region-to-phrase correspondences for richer image-to-sentence models. IJCV 123(1), 74–93 (2017)
Article MathSciNet Google Scholar
Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: Proceedings Advantages Neural Information Processing Systems, pp. 487–495 (2014)
Google Scholar
Gan, Z., Gan, C., He, X., Pu, Y.: Semantic compositional networks for visual captioning. In: CVPR, pp. 1–10 (2017)
Google Scholar
Yao, T., Pan, Y., Li, Y., Mei, T.: Incorporating copying mechanism in image captioning for learning novel objects. In: CVPR, pp. 6580–6588 (2017)
Google Scholar
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: CVPR (2015)
Google Scholar
Mao, J., Xu, W., Yang, Y., Wang, J., Huang, Z., Yuille, A.: Deep captioning with multimodal recurrent neural networks. In: ICLR (2015)
Google Scholar
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhutdinov, R., Zemel, R.S., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. In: ICML (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Jeppiaar SRR Engineering College, Chennai, India
S. Aravindkumar
Department of Computer Technology, Anna University, MIT Campus, Chennai, India
P. Varalakshmi
Department of Information Technology, Anna University, MIT Campus, Chennai, India
M. Hemalatha

Authors

S. Aravindkumar
View author publications
You can also search for this author in PubMed Google Scholar
P. Varalakshmi
View author publications
You can also search for this author in PubMed Google Scholar
M. Hemalatha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Aravindkumar .

Editor information

Editors and Affiliations

Machine Intelligence Research Labs, Auburn, WA, USA
Ajith Abraham
School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India
Aswani Kumar Cherukuri
Tijuana Institute of Technology, Tijuana, Mexico
Patricia Melin
Machine Intelligence Research Labs, Auburn, WA, USA
Niketa Gandhi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aravindkumar, S., Varalakshmi, P., Hemalatha, M. (2020). Generation of Image Caption Using CNN-LSTM Based Approach. In: Abraham, A., Cherukuri, A.K., Melin, P., Gandhi, N. (eds) Intelligent Systems Design and Applications. ISDA 2018 2018. Advances in Intelligent Systems and Computing, vol 940. Springer, Cham. https://doi.org/10.1007/978-3-030-16657-1_43

Download citation

DOI: https://doi.org/10.1007/978-3-030-16657-1_43
Published: 12 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16656-4
Online ISBN: 978-3-030-16657-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Generation of Image Caption Using CNN-LSTM Based Approach

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

An Efficient Image Caption Generation Using GoogLeNet

Computational Intelligence for Image Caption Generation

Automatic image caption generation using deep learning

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Generation of Image Caption Using CNN-LSTM Based Approach

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

An Efficient Image Caption Generation Using GoogLeNet

Computational Intelligence for Image Caption Generation

Automatic image caption generation using deep learning

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation