research-article

Research on Image Description Generation Method Based on G-AoANet

Authors:

Yuan LiAuthors Info & Claims

AIPR '22: Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition

Pages 631 - 636

https://doi.org/10.1145/3573942.3574072

Published: 16 May 2023 Publication History

Abstract

Most of the image description generation methods in the attention-based encoder-decoder framework extract local features from images. Despite the relatively high semantic level of local features, it still has two problems to be solved, one is object loss, where some important objects may be lost when generating image descriptions, and the other is prediction error, as an object may be identified in the wrong class. In this paper, a G-AoANet model is proposed to solve the above problems. The model uses an attention mechanism to combine global features with local features. In this way, our model can selectively focus on both object and contextual information, improving the quality of the generated descriptions. Experimental results show that the model improves the initially reported best CIDEr-D and SPICE scores on the MS COCO dataset by 9.3% and 5.1% respectively.

References

[1]

Hao Fang, Saurabh Gupta, Forrest Iandola, Rupesh K Srivastava, Li Deng, Piotr Dollár, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John C Platt, From captions to visual concepts and back. In CVPR, 2015

[2]

Girish Kulkarni, Visruth Premraj, Vicente Ordonez, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C Berg, and Tamara L Berg. Babytalk: Understanding and generating simple image descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(12):2891–2903, 2013.

Digital Library

[3]

L. Huang, W. Wang, J. Chen, and X.-Y . Wei, “Attention on Attention for Image Captioning,” in ICCV, 2019.

[4]

C. Xiong, S. Merity, and R. Socher. Dynamic memory networks for visual and textual question answering. In ICML, 2016.

Digital Library

[5]

P . Kuznetsova, V . Ordonez, A. C. Berg, T. L. Berg, and Y . Choi, “Collective generation of natural image descriptions,” in Proc. 50th Annu. Meet. Assoc. Comput. Linguistics, Long Papers-Volume 1, 2012, pp. 359–368.

[6]

A. Farhadi, “Every picture tells a story: Generating sentences from images,” in Proc. Eur . Conf. Comput. Vis., Berlin, Germany: Springer, 2010, pp. 15–29.

[7]

C. Szegedy, V . Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” 2015, arXiv:1512.00567.

[8]

K. Cho, “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” Empirical Meth. Natural Lang. Process., pp. 1724–1734, 2014.

[9]

D. Bahdanau, K. Cho, and Y . Bengio, “Neural machine translation by jointly learning to align and translate,” in Proc. Int. Conf. Learn. Represent., 2015.

[10]

Ronald A Rensink. The dynamic representation of scenes. Visual Cognition, 7:17–42, 2000.

[11]

L. Minh-0ang, H. Pham, and C. D. Manning, “Effective approaches to attention-based neural machine translation,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, September 2015.

[12]

Tsung Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollr, and C. Lawrence Zitnick. Microsoft coco: Common objects in context. In ECCV, 2014.

[13]

Banerjee Satanjeev. Meteor : An automatic metric for mt evaluation with improved correlation with human judgments. In ACL, 2005.

[14]

Ramakrishna Vedantam, C. Lawrence Zitnick, and Devi Parikh. Cider: Consensus-based image description evaluation. In CVPR, 2015.

[15]

Carlos Flick. Rouge: A package for automatic evaluation of summaries. In The Workshop on Text Summarization Branches Out, 2004.

[16]

Peter Anderson, Basura Fernando, Mark Johnson, and Stephen Gould. Spice: Semantic propositional image caption evaluation. In ECCV, 2016.

[17]

Kishore Papineni, Salim Roukos, Todd Ward, and Weijing Zhu. Bleu: a method for automatic evaluation of machine translation. In ACL, 2002.

Digital Library

Index Terms

Research on Image Description Generation Method Based on G-AoANet
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Automatic image description by using word-level features
ICIMCS '13: Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service

Automatic image description is one of the challenging tasks of image recognitions. However, there are image descriptions that contain some too specific phrases that cannot be judged only from appearance of images. In this paper, we propose a novel ...
Rich Image Description Based on Regions
MM '15: Proceedings of the 23rd ACM international conference on Multimedia

Abstract Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. In contrast to the previous image description methods that focus on describing ...
Image Description and Matching Scheme for Identical Image Searching
COMPUTATIONWORLD '09: Proceedings of the 2009 Computation World: Future Computing, Service Computation, Cognitive, Adaptive, Content, Patterns

This paper presents image description and matching scheme for identical image searching. In the proposed extraction scheme, an image is described by spatial and statistical features, and these features are combined. The concentric square partition is ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

AIPR '22: Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition

September 2022

1221 pages

ISBN:9781450396899

DOI:10.1145/3573942

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 May 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

AIPR 2022

AIPR 2022: 2022 5th International Conference on Artificial Intelligence and Pattern Recognition

September 23 - 25, 2022

Xiamen, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
26
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)3

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten