research-article

Multimodal Relation Extraction via a Mixture of Hierarchical Visual Context Learners

Authors:

Yongyi MaoAuthors Info & Claims

WWW '24: Proceedings of the ACM Web Conference 2024

Pages 4283 - 4294

https://doi.org/10.1145/3589334.3645603

Published: 13 May 2024 Publication History

Abstract

Multimodal relation extraction is a fundamental task of multimodal information extraction. Recent studies have shown promising results by integrating hierarchical visual features from local regions, like image patches, to the broader global regions that form the entire image. However, research to date has largely ignored the understanding of how hierarchical visual semantics are represented and the characteristics that can benefit relation extraction. To bridge this gap, we propose a novel two-stage hierarchical visual context fusion transformer incorporating the mixture of multimodal experts framework to effectively represent and integrate hierarchical visual features into textual semantic representations. In addition, we introduce the concept of hierarchical tracking maps to facilitate the understanding of the intrinsic mechanisms of image information processing involved in multimodal models. We thoroughly investigate the implications of hierarchical visual contexts through four dimensions: performance evaluation, the nature of auxiliary visual information, the patterns observed in the image encoding hierarchy, and the significance of various visual encoding levels. Empirical studies show that our approach achieves new state-of-the-art performance on the MNRE dataset.

Supplemental Material

MP4 File

Supplemental video

Download
41.94 MB

References

[1]

Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer Normalization. arxiv: 1607.06450 [stat.ML]

[2]

Livio Baldini Soares, Nicholas FitzGerald, Jeffrey Ling, and Tom Kwiatkowski. 2019. Matching the Blanks: Distributional Similarity for Relation Learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 2895--2905. https://doi.org/10.18653/v1/P19--1279

[3]

Necati Cihan Camgö z, Oscar Koller, Simon Hadfield, and Richard Bowden. 2020. Sign Language Transformers: Joint End-to-End Sign Language Recognition and Translation. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Computer Vision Foundation / IEEE, Seattle, WA, USA, 10020--10030. https://doi.org/10.1109/CVPR42600.2020.01004

[4]

Xiang Chen, Ningyu Zhang, Lei Li, Shumin Deng, Chuanqi Tan, Changliang Xu, Fei Huang, Luo Si, and Huajun Chen. 2022a. Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion. In SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Enrique Amigó, Pablo Castells, Julio Gonzalo, Ben Carterette, J. Shane Culpepper, and Gabriella Kazai (Eds.). ACM, Madrid, Spain, 904--915. https://doi.org/10.1145/3477495.3531992

Digital Library

[5]

Xiang Chen, Ningyu Zhang, Lei Li, Yunzhi Yao, Shumin Deng, Chuanqi Tan, Fei Huang, Luo Si, and Huajun Chen. 2022b. Good Visual Guidance Make A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction. In Findings of the Association for Computational Linguistics: NAACL 2022. Association for Computational Linguistics, Seattle, United States, 1607--1618. https://doi.org/10.18653/v1/2022.findings-naacl.121

[6]

Yen-Chun Chen, Linjie Li, Licheng Yu, Ahmed El Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng, and Jingjing Liu. 2020. UNITER: UNiversal Image-TExt Representation Learning. In Computer Vision - ECCV 2020 - 16th European Conference (Lecture Notes in Computer Science, Vol. 12375), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer, Glasgow, UK, 104--120. https://doi.org/10.1007/978--3-030--58577--8_7

[7]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. https://doi.org/10.18653/v1/N19--1423

[8]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In 9th International Conference on Learning Representations. OpenReview.net, Virtual Event, Austria. https://openreview.net/forum?id=YicbFdNTTy

[9]

David Eigen, Marc'Aurelio Ranzato, and Ilya Sutskever. 2014. Learning Factored Representations in a Deep Mixture of Experts. In 2nd International Conference on Learning Representations Workshop, Yoshua Bengio and Yann LeCun (Eds.). Banff, AB, Canada.

[10]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, Las Vegas, NV, USA, 770--778. https://doi.org/10.1109/CVPR.2016.90

[11]

Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. 1991. Adaptive Mixtures of Local Experts. Neural Comput., Vol. 3, 1 (1991), 79--87. https://doi.org/10.1162/neco.1991.3.1.79

[12]

Meihuizi Jia, Xin Shen, Lei Shen, Jinhui Pang, Lejian Liao, Yang Song, Meng Chen, and Xiaodong He. 2022. Query Prior Matters: A MRC Framework for Multimodal Named Entity Recognition. In MM '22: The 30th ACM International Conference on Multimedia, Jo a o Magalh a es, Alberto Del Bimbo, Shin'ichi Satoh, Nicu Sebe, Xavier Alameda-Pineda, Qin Jin, Vincent Oria, and Laura Toni (Eds.). ACM, Lisboa, Portugal, 3549--3558. https://doi.org/10.1145/3503161.3548427

Digital Library

[13]

Amina Kadry and Laura Dietz. 2017. Open Relation Extraction for Support Passage Retrieval: Merit and Open Issues. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (Shinjuku, Tokyo, Japan) (SIGIR '17). Association for Computing Machinery, New York, NY, USA, 1149--1152. https://doi.org/10.1145/3077136.3080744

Digital Library

[14]

Kyungmin Kim, Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Zhicheng Yan, Peter Vajda, and Seon Joo Kim. 2021. Rethinking the Self-Attention in Vision Transformers. In IEEE Conference on Computer Vision and Pattern Recognition Workshops. Computer Vision Foundation / IEEE, 3071--3075. https://doi.org/10.1109/CVPRW53098.2021.00342

[15]

Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, and Zhifeng Chen. 2021. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding. In 9th International Conference on Learning Representations, Virtual Event, Austria. OpenReview.net. https://openreview.net/forum?id=qrwe7XHTmYb

[16]

Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, and Kai-Wei Chang. 2019a. VisualBERT: A Simple and Performant Baseline for Vision and Language. arxiv: 1908.03557 [cs.CV]

[17]

Xiaoya Li, Fan Yin, Zijun Sun, Xiayu Li, Arianna Yuan, Duo Chai, Mingxin Zhou, and Jiwei Li. 2019b. Entity-Relation Extraction as Multi-Turn Question Answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 1340--1350. https://doi.org/10.18653/v1/P19--1129

[18]

Colin Lockard, Prashant Shiralkar, and Xin Luna Dong. 2019. OpenCeres: When Open Information Extraction Meets the Semi-Structured Web. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Minneapolis, Minnesota, 3047--3056. https://doi.org/10.18653/v1/N19--1309

[19]

Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In 7th International Conference on Learning Representations. OpenReview.net, New Orleans, LA, USA. https://openreview.net/forum?id=Bkg6RiCqY7

[20]

Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. 2019. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'Alché -Buc, Emily B. Fox, and Roman Garnett (Eds.). Vancouver, BC, Canada, 13--23. https://proceedings.neurips.cc/paper/2019/hash/c74d97b01eae257e44aa9d5bade97baf-Abstract.html

[21]

Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed H. Chi. 2018. Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Yike Guo and Faisal Farooq (Eds.). ACM, London, UK, 1930--1939. https://doi.org/10.1145/3219819.3220007

Digital Library

[22]

Basil Mustafa, Carlos Riquelme, Joan Puigcerver, Rodolphe Jenatton, and Neil Houlsby. 2022. Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 9564--9576. https://proceedings.neurips.cc/paper_files/paper/2022/file/3e67e84abf900bb2c7cbd5759bfce62d-Paper-Conference.pdf

[23]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kö pf, Edward Z. Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'Alché -Buc, Emily B. Fox, and Roman Garnett (Eds.). Vancouver, BC, Canada, 8024--8035. https://proceedings.neurips.cc/paper/2019/hash/bdbca288fee7f92f2bfa9f7012727740-Abstract.html

[24]

Sachin Pawar, Girish K. Palshikar, and Pushpak Bhattacharyya. 2017. Relation Extraction : A Survey. arxiv: 1712.05191 [cs.CL]

[25]

Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 39, 6 (2017), 1137--1149. https://doi.org/10.1109/TPAMI.2016.2577031

Digital Library

[26]

Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc V. Le, Geoffrey E. Hinton, and Jeff Dean. 2017. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. In 5th International Conference on Learning Representations. OpenReview.net, Toulon, France. https://openreview.net/forum?id=B1ckMDqlg

[27]

Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, and Jifeng Dai. 2020. VL-BERT: Pre-training of Generic Visual-Linguistic Representations. In 8th International Conference on Learning Representations. OpenReview.net, Addis Ababa, Ethiopia. https://openreview.net/forum?id=SygXPaEYvH

[28]

Lin Sun, Jiquan Wang, Kai Zhang, Yindu Su, and Fangsheng Weng. 2021. RpBERT: A Text-image Relation Propagation-based BERT Model for Multimodal NER. In Thirty-Fifth AAAI Conference on Artificial Intelligence. AAAI Press, 13860--13868. https://ojs.aaai.org/index.php/AAAI/article/view/17633

[29]

Kaihua Tang, Yulei Niu, Jianqiang Huang, Jiaxin Shi, and Hanwang Zhang. 2020. Unbiased Scene Graph Generation From Biased Training. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Computer Vision Foundation / IEEE, Seattle, WA, USA, 3713--3722. https://doi.org/10.1109/CVPR42600.2020.00377

[30]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing Data using t-SNE. Journal of Machine Learning Research, Vol. 9, 86 (2008), 2579--2605. http://jmlr.org/papers/v9/vandermaaten08a.html

[31]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017).

[32]

Hai Wan, Manrong Zhang, Jianfeng Du, Ziling Huang, Yufei Yang, and Jeff Z. Pan. 2021. FL-MSRE: A Few-Shot Learning based Approach to Multimodal Social Relation Extraction. In Thirty-Fifth AAAI Conference on Artificial Intelligence. AAAI Press, 13916--13923. https://ojs.aaai.org/index.php/AAAI/article/view/17639

[33]

Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. 2021. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions. In 2021 IEEE/CVF International Conference on Computer Vision. IEEE, Montreal, QC, Canada, 548--558. https://doi.org/10.1109/ICCV48922.2021.00061

[34]

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arxiv: 1609.08144 [cs.CL]

[35]

Bo Xu, Shizhou Huang, Ming Du, Hongya Wang, Hui Song, Chaofeng Sha, and Yanghua Xiao. 2022a. Different Data, Different Modalities! Reinforced Data Splitting for Effective Multimodal Information Extraction from Social Media Posts. In Proceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Gyeongju, Republic of Korea, 1855--1864. https://aclanthology.org/2022.coling-1.160

[36]

Bo Xu, Shizhou Huang, Chaofeng Sha, and Hongya Wang. 2022b. MAF: A General Matching and Alignment Framework for Multimodal Named Entity Recognition. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining. Association for Computing Machinery, Virtual Event, AZ, USA, 1215--1223. https://doi.org/10.1145/3488560.3498475

Digital Library

[37]

Zhengyuan Yang, Boqing Gong, Liwei Wang, Wenbing Huang, Dong Yu, and Jiebo Luo. 2019. A Fast and Accurate One-Stage Approach to Visual Grounding. In 2019 IEEE/CVF International Conference on Computer Vision. IEEE, Seoul, Korea (South), 4682--4692. https://doi.org/10.1109/ICCV.2019.00478

[38]

Jianfei Yu, Jing Jiang, Li Yang, and Rui Xia. 2020. Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal Transformer. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 3342--3352. https://doi.org/10.18653/v1/2020.acl-main.306

[39]

Li Yuan, Yi Cai, Jin Wang, and Qing Li. 2023. Joint Multimodal Entity-Relation Extraction Based on Edge-enhanced Graph Alignment Network and Word-pair Relation Tagging. In Proceedings of the AAAI Conference on Artificial Intelligence. 11051--11059. https://doi.org/10.1609/aaai.v37i9.26309

Digital Library

[40]

Ben P. Yuhas, Moise H. Goldstein Jr., and Terrence J. Sejnowski. 1989. Integration of acoustic and visual speech signals using neural networks. IEEE Commun. Mag., Vol. 27, 11 (1989), 65--71. https://doi.org/10.1109/35.41402

Digital Library

[41]

Daojian Zeng, Kang Liu, Yubo Chen, and Jun Zhao. 2015. Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Lisbon, Portugal, 1753--1762. https://doi.org/10.18653/v1/D15--1203

[42]

Dong Zhang, Suzhong Wei, Shoushan Li, Hanqian Wu, Qiaoming Zhu, and Guodong Zhou. 2021. Multi-modal Graph Fusion for Named Entity Recognition with Targeted Visual Guidance. In Thirty-Fifth AAAI Conference on Artificial Intelligence. AAAI Press, 14347--14355. https://ojs.aaai.org/index.php/AAAI/article/view/17687

[43]

Ningyu Zhang, Xin Xu, Liankuan Tao, Haiyang Yu, Hongbin Ye, Shuofei Qiao, Xin Xie, Xiang Chen, Zhoubo Li, and Lei Li. 2022. DeepKE: A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Abu Dhabi, UAE, 98--108. https://aclanthology.org/2022.emnlp-demos.10

[44]

Changmeng Zheng, Junhao Feng, Ze Fu, Yi Cai, Qing Li, and Tao Wang. 2021a. Multimodal Relation Extraction with Efficient Graph Alignment. In MM '21: ACM Multimedia Conference, Heng Tao Shen, Yueting Zhuang, John R. Smith, Yang Yang, Pablo Cé sar, Florian Metze, and Balakrishnan Prabhakaran (Eds.). ACM, Virtual Event, China, 5298--5306. https://doi.org/10.1145/3474085.3476968

Digital Library

[45]

Changmeng Zheng, Zhiwei Wu, Junhao Feng, Ze Fu, and Yi Cai. 2021b. MNRE: A Challenge Multimodal Dataset for Neural Relation Extraction with Visual Evidence in Social Media Posts. In 2021 IEEE International Conference on Multimedia and Expo. IEEE, Shenzhen, China, 1--6. https://doi.org/10.1109/ICME51207.2021.9428274

[46]

Luowei Zhou, Hamid Palangi, Lei Zhang, Houdong Hu, Jason J. Corso, and Jianfeng Gao. 2020. Unified Vision-Language Pre-Training for Image Captioning and VQA. In The Thirty-Fourth AAAI Conference on Artificial Intelligence. AAAI Press, New York, NY, USA, 13041--13049. https://ojs.aaai.org/index.php/AAAI/article/view/7005

[47]

Linchao Zhu and Yi Yang. 2020. ActBERT: Learning Global-Local Video-Text Representations. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Computer Vision Foundation / IEEE, Seattle, WA, USA, 8743--8752. https://doi.org/10.1109/CVPR42600.2020.00877 io

Cited By

He LWang HWu ZZhang JDai XChen JCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Focus & Gating: A Multimodal Approach for Unveiling Relations in Noisy Social MediaProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680995(1379-1388)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680995
Zheng JZhao YJin GCui R(2024)Text-Guided Hierarchical Visual Prefix Network for Multimodal Relation Extraction2024 4th International Conference on Electronic Information Engineering and Computer Science (EIECS)10.1109/EIECS63941.2024.10800016(1051-1055)Online publication date: 27-Sep-2024
https://doi.org/10.1109/EIECS63941.2024.10800016

Index Terms

Multimodal Relation Extraction via a Mixture of Hierarchical Visual Context Learners
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Information extraction
2. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval

Recommendations

Multimodal Named Entity Recognition and Relation Extraction with Retrieval-Augmented Strategy
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Multimodal Named Entity Recognition (MNER) and Multimodal Relation Extraction (MRE) are tasks in information retrieval that aim to recognize entities and extract relations among them using information from multiple modalities, such as text and images. ...
CGI-MRE: A Comprehensive Genetic-Inspired Model For Multimodal Relation Extraction
ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval

Multimodal Relation Extraction (MRE) is an entity relationship extraction method based on multimodal information. Most existing MRE methods have two issues: 1) Weak cross-modal correlation and poor semantic consistency. 2) They do not achieve text-guided ...
Prompt Me Up: Unleashing the Power of Alignments for Multimodal Entity and Relation Extraction
MM '23: Proceedings of the 31st ACM International Conference on Multimedia

How can we better extract entities and relations from text? Using multimodal extraction with images and text obtains more signals for entities and relations, and aligns them through graphs or hierarchical fusion, aiding in extraction. Despite attempts at ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '24: Proceedings of the ACM Web Conference 2024

May 2024

4826 pages

ISBN:9798400701719

DOI:10.1145/3589334

General Chairs:
Tat-Seng Chua
National University of Singapore
,
Chong-Wah Ngo
Singapore Management University
,
Proceedings Chair:
Roy Ka-Wei Lee
Singapore University of Technology and Design
,
Program Chairs:
Ravi Kumar
Google
,
Hady W. Lauw
Singapore Management University

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Artifacts Available / v1.1

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China

Conference

WWW '24

Sponsor:

SIGWEB

WWW '24: The ACM Web Conference 2024

May 13 - 17, 2024

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
406
Total Downloads

Downloads (Last 12 months)406
Downloads (Last 6 weeks)39

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

He LWang HWu ZZhang JDai XChen JCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Focus & Gating: A Multimodal Approach for Unveiling Relations in Noisy Social MediaProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680995(1379-1388)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680995
Zheng JZhao YJin GCui R(2024)Text-Guided Hierarchical Visual Prefix Network for Multimodal Relation Extraction2024 4th International Conference on Electronic Information Engineering and Computer Science (EIECS)10.1109/EIECS63941.2024.10800016(1051-1055)Online publication date: 27-Sep-2024
https://doi.org/10.1109/EIECS63941.2024.10800016

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten