skip to main content
10.1145/3469877.3495642acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Focusing Attention across Multiple Images for Multimodal Event Detection

Published: 10 January 2022 Publication History

Abstract

Multimodal social event detection has been attracting tremendous research attention in recent years, due to that it provides comprehensive and complementary understanding of social events and is important to public security and administration. Most existing works have been focusing on the fusion of multimodal information, especially for single image and text fusion. Such single image-text pair processing breaks the correlations between images of the same post and may affect the accuracy of event detection. In this work, we propose to focus attention across multiple images for multimodal event detection, which is also more reasonable for tweets with short text and multiple images. Towards this end, we elaborate a novel Multi-Image Focusing Network (MIFN) to connect text content with visual aspects in multiple images. Our MIFN consists of a feature extractor, a multi-focal network and an event classifier. The multi-focal network implements a focal attention across all the images, and fuses the most related regions with texts as multimodal representation. The event classifier finally predict the social event class based on the multimodal representations. To evaluate the effectiveness of our proposed approach, we conduct extensive experiments on a commonly-used disaster dataset. The experimental results demonstrate that, in both humanitarian event detection task and its variant of hurricane disaster, the proposed MIFN outperforms all the baselines. The ablation studies also exhibit the ability to filter the irrelevant regions across images which results in improving the accuracy of multimodal event detection.

References

[1]
Mahdi Abavisani, Liwei Wu, Shengli Hu, Joel Tetreault, and Alejandro Jaimes. 2020. Multimodal categorization of crisis events in social media. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14679–14689.
[2]
Firoj Alam, Ferda Ofli, and Muhammad Imran. 2018. Crisismmd: Multimodal twitter datasets from natural disasters. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 12.
[3]
James Allan. 2002. Introduction to topic detection and tracking. In Topic detection and tracking. Springer, 1–16.
[4]
Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2018. Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6077–6086.
[5]
Yi Bin, Yang Yang, Fumin Shen, Ning Xie, Heng Tao Shen, and Xuelong Li. 2018. Describing video with attention-based bidirectional LSTM. IEEE transactions on cybernetics 49, 7 (2018), 2631–2641.
[6]
Yi Bin, Yang Yang, Fumin Shen, and Xing Xu. 2016. Combining multi-representation for multimedia event detection using co-training. Neurocomputing 217(2016), 11–18.
[7]
Xiaojun Chang, Zhigang Ma, Yi Yang, Zhiqiang Zeng, and Alexander G Hauptmann. 2016. Bi-level semantic representation analysis for multimedia event detection. IEEE transactions on cybernetics 47, 5 (2016), 1180–1197.
[8]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
[9]
Harold Hotelling. 1992. Relations between two sets of variates. In Breakthroughs in statistics. Springer, 162–190.
[10]
Po-Yao Huang, Junwei Liang, Jean-Baptiste Lamare, and Alexander G Hauptmann. 2018. Multimodal filtering of social media for temporal monitoring and event analysis. In Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval. 450–457.
[11]
Karen Sparck Jones. 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of documentation(1972).
[12]
Zhen-zhong Lan, Lei Bao, Shoou-I Yu, Wei Liu, and Alexander G Hauptmann. 2012. Double fusion for multimedia event detection. In International Conference on Multimedia Modeling. Springer, 173–185.
[13]
Nut Limsopatham and Nigel Collier. 2016. Bidirectional LSTM for named entity recognition in Twitter messages. (2016).
[14]
Zhihong Lin, Huidong Jin, Bella Robinson, and Xunguo Lin. 2016. Towards an accurate social media disaster event detection system based on deep learning and semantic representation. In Proceedings of the 14th Australasian Data Mining Conference, Canberra, Australia. 6–8.
[15]
Chunxiao Liu, Zhendong Mao, An-An Liu, Tianzhu Zhang, Bin Wang, and Yongdong Zhang. 2019. Focus your attention: A bidirectional focal attention network for image-text matching. In Proceedings of the 27th ACM International Conference on Multimedia. 3–11.
[16]
Kristen Lovejoy, Richard D Waters, and Gregory D Saxton. 2012. Engaging stakeholders through Twitter: How nonprofit organizations are getting more out of 140 characters or less. Public relations review 38, 2 (2012), 313–318.
[17]
Andrew J McMinn, Yashar Moshfeghi, and Joemon M Jose. 2013. Building a large-scale corpus for evaluating event detection on twitter. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. 409–418.
[18]
Ferda Ofli, Firoj Alam, and Muhammad Imran. 2020. Analysis of social media data using multimodal deep learning for disaster response. arXiv preprint arXiv:2004.11838(2020).
[19]
L. Peng, Y. Yang, Z. Wang, Z. Huang, and H. T. Shen. 2020. MRA-Net: Improving VQA via Multi-modal Relation Attention Network. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020), 10.1109/TPAMI.2020.3004830.
[20]
Georgios Petkos, Symeon Papadopoulos, and Yiannis Kompatsiaris. 2012. Social event detection using multimodal clustering and integrating supervisory signals. In Proceedings of the 2nd ACM International Conference on Multimedia Retrieval. 1–8.
[21]
H. T. Shen, Y. Zhu, W. Zheng, and X. Zhu. 2020. Half-Quadratic Minimization for Unsupervised Feature Selection on Incomplete Data. IEEE Transactions on Neural Networks and Learning Systems (2020), 10.1109/TNNLS.2020.3009632.
[22]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998–6008.
[23]
Kang Xu, Guilin Qi, Junheng Huang, Tianxing Wu, and Xuefeng Fu. 2018. Detecting bursts in sentiment-aware topics from social media. Knowledge-Based Systems 141 (2018), 44–54.
[24]
Xiao Yang, Craig Macdonald, and Iadh Ounis. 2018. Using word embeddings in twitter election classification. Information Retrieval Journal 21, 2 (2018), 183–207.
[25]
Yang Yang, Jie Zhou, Jiangbo Ai, Yi Bin, Alan Hanjalic, Heng Tao Shen, and Yanli Ji. 2018. Video captioning by adversarial LSTM. IEEE Transactions on Image Processing 27, 11 (2018), 5600–5611.

Cited By

View all
  • (2023)MMpedia: A Large-Scale Multi-modal Knowledge GraphThe Semantic Web – ISWC 202310.1007/978-3-031-47243-5_2(18-37)Online publication date: 6-Nov-2023
  • (2023)Multimodal Conditional VAE for Zero-Shot Real-World Event DiscoveryAdvanced Data Mining and Applications10.1007/978-3-031-46664-9_43(644-659)Online publication date: 27-Aug-2023

Index Terms

  1. Focusing Attention across Multiple Images for Multimodal Event Detection
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        MMAsia '21: Proceedings of the 3rd ACM International Conference on Multimedia in Asia
        December 2021
        508 pages
        ISBN:9781450386074
        DOI:10.1145/3469877
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 10 January 2022

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Focal Attention
        2. Multi-focal Network
        3. Multimodal Social Event Detection

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Funding Sources

        • Science and technology development fund of CETC

        Conference

        MMAsia '21
        Sponsor:
        MMAsia '21: ACM Multimedia Asia
        December 1 - 3, 2021
        Gold Coast, Australia

        Acceptance Rates

        Overall Acceptance Rate 59 of 204 submissions, 29%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)37
        • Downloads (Last 6 weeks)1
        Reflects downloads up to 16 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2023)MMpedia: A Large-Scale Multi-modal Knowledge GraphThe Semantic Web – ISWC 202310.1007/978-3-031-47243-5_2(18-37)Online publication date: 6-Nov-2023
        • (2023)Multimodal Conditional VAE for Zero-Shot Real-World Event DiscoveryAdvanced Data Mining and Applications10.1007/978-3-031-46664-9_43(644-659)Online publication date: 27-Aug-2023

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media