research-article

Image Description Generation Method Based on X-Linear Attention Mechanism

Authors:

Ruixue ShenAuthors Info & Claims

AIPR '22: Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition

Pages 581 - 586

https://doi.org/10.1145/3573942.3574065

Published: 16 May 2023 Publication History

Abstract

Aiming at the problem that existing image description models cannot model high-order multimodal feature interaction, this paper introduces the X-Linear attention mechanism, which uses bilinear pooling and ELU activation function to model high-order feature interaction between multimodal features. At the same time, the X-Linear attention mechanism uses spatial and channel attention mechanisms to enhance the expression ability of the model and the ability to generate image description sentences. The experimental results on the MSCOCO data set show that this method is effective and has a great improvement in each evaluation metric.

References

[1]

Kulkarni G, Premraj V, Ordonez V, Babytalk: Understanding and generating simple image descriptions[J]. IEEE transactions on pattern analysis and machine intelligence, 2013, 35(12): 2891-2903.

[2]

Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks[C]//Advances in neural information processing systems. 2014: 3104-3112.

[3]

Vinyals O, Toshev A, Bengio S, Show and tell: A neural image caption generator[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 3156-3164.

[4]

Ren S, He K, Girshick R, Faster r-cnn: Towards real-time object detection with region proposal networks[J]. Advances in neural information processing systems, 2015, 28: 91-99.

[5]

Lin T Y, RoyChowdhury A, Maji S. Bilinear cnn models for fine-grained visual recognition[C]//Proceedings of the IEEE international conference on computer vision. 2015: 1449-1457.

[6]

Gao Y, Beijbom O, Zhang N, Compact bilinear pooling[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 317-326.

[7]

Fukui A, Park D H, Yang D, Multimodal compact bilinear pooling for visual question answering and visual grounding[J]. arXiv preprint arXiv:1606.01847, 2016.

[8]

Kim J H, On K W, Lim W, Hadamard product for low-rank bilinear pooling[J]. arXiv preprint arXiv:1610.04325, 2016.

[9]

Huang L, Wang W, Xia Y, Adaptively aligned image captioning via adaptive attention time[J]. Advances in neural information processing systems, 2019, 32.

[10]

[10] Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132-7141.

[11]

Barron J T. Continuously differentiable exponential linear units[J]. arXiv preprint arXiv:1704.07483, 2017.

[12]

Karpathy A, Fei-Fei L. Deep visual-semantic alignments for generating image descriptions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 3128-3137.

[13]

Bengio S, Vinyals O, Jaitly N, Scheduled sampling for sequence prediction with recurrent neural networks[J]. arXiv preprint arXiv:1506.03099, 2015.

[14]

Rennie S J, Marcheret E, Mroueh Y, Self-critical sequence training for image captioning[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 7008-7024.

[15]

Anderson P, He X, Buehler C, Bottom-up and top-down attention for image captioning and visual question answering[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 6077-6086.

Index Terms

Image Description Generation Method Based on X-Linear Attention Mechanism
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

All-in-One Image Dehazing Based on Attention Mechanism
Intelligent Robotics and Applications
Abstract
The objective of image dehazing is to restore the clear content from a hazy image. However, different parts of the same image pose varying degrees of difficulty for recovery. Existing image dehazing networks treat channel and pixel features ...
Multi-attention mechanism for Chinese description of videos
CSAI '20: Proceedings of the 2020 4th International Conference on Computer Science and Artificial Intelligence

Using natural language to describe videos is a hot topic in the field of natural language processing and computer vision. However, most of the video description tasks are to generate English descriptions now, rarely to generate Chinese descriptions. ...
Image Inpainting Based on Edge Features and Attention Mechanism
ICIGP '22: Proceedings of the 2022 5th International Conference on Image and Graphics Processing

Image inpainting as a kind important application in our life and entertainment, it also is a popular task of computer vision. The latest deep learning-based approaches have shown promising results for the challenging task of inpainting damaged regions ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

AIPR '22: Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition

September 2022

1221 pages

ISBN:9781450396899

DOI:10.1145/3573942

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 May 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

AIPR 2022

AIPR 2022: 2022 5th International Conference on Artificial Intelligence and Pattern Recognition

September 23 - 25, 2022

Xiamen, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
28
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)3

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten