Attention-Based Multimodal Entity Linking with High-Quality Images

Zhang, Li; Li, Zhixu; Yang, Qiang

doi:10.1007/978-3-030-73197-7_35

Li Zhang¹⁶,
Zhixu Li^16,17 &
Qiang Yang¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12682))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

3877 Accesses

Abstract

Multimodal entity linking (MEL) is an emerging research field which uses both textual and visual information to map an ambiguous mention to an entity in a knowledge base (KB). However, images do not always help, which may also backfire if they are irrelevant to the textual content at all. Besides, the existing efforts mainly focus on learning a representation of both mentions and entities from their textual and visual contexts, without considering the negative impact brought by noisy irrelevant images, which happens frequently with social media posts. In this paper, we propose a novel MEL model, which not only removes the negative impact of noisy images, but also uses multiple attention mechanism to better capture the connection between mention representation and its corresponding entity representation. Our empirical study on a large real data collection demonstrates the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Multilevel Interaction Network Framework for Multimodal Entity Linking

Cross-Modal Entity Resolution for Image and Text Integrating Global and Fine-Grained Joint Attention Mechanism

Article 25 June 2022

Multimodal heterogeneous graph entity-level fusion for named entity recognition with multi-granularity visual guidance

Article 22 July 2024

References

Adjali, O., Besançon, R., Ferret, O., Le Borgne, H., Grau, B.: Multimodal entity linking for Tweets. In: Jose, J.M., et al. (eds.) ECIR 2020, Part I. LNCS, vol. 12035, pp. 463–478. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45439-5_31
Chapter Google Scholar
Cheng, J., et al.: Entity linking for Chinese short texts based on BERT and entity name embeddings. In: China Conference on Knowledge Graph and Semantic Computing (CCKS) (2019). https://conference.bj.bcebos.com/ccks2019/eval/webpage/pdfs/eval_paper_2_1.pdf
Chong, W.-H., Lim, E.-P., Cohen, W.: Collective entity linking in Tweets over space and time. In: Jose, J.M., Hauff, C., Altıngovde, I.S., Song, D., Albakour, D., Watt, S., Tait, J. (eds.) ECIR 2017. LNCS, vol. 10193, pp. 82–94. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56608-5_7
Chapter Google Scholar
Csomai, A., Mihalcea, R.: Linking documents to encyclopedic knowledge. IEEE Intell. Syst. 23(5), 34–41 (2008)
Article Google Scholar
Cucerzan, S.: Large-scale named entity disambiguation based on Wikipedia data. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 708–716 (2007)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dredze, M., Andrews, N., Deyoung, J.: Twitter at the Grammys: a social media corpus for entity linking and disambiguation. In: International Workshop on Natural Language Processing for Social Media (2016)
Google Scholar
Fang, Y., Chang, M.W.: Entity linking on microblogs with spatial and temporal signals. Trans. Assoc. Comput. Linguist. 2, 259–272 (2014)
Article Google Scholar
Hua, W., Zheng, K., Zhou, X.: Microblog entity linking with social temporal context, pp. 1761–1775 (2015)
Google Scholar
Huang, D., Wang, J.: An approach on Chinese microblog entity linking combining Baidu Encyclopaedia and word2vec. Procedia Comput. Sci. 111, 37–45 (2017)
Article Google Scholar
Liu, X., Li, Y., Wu, H., Ming, Z., Yi, L.: Entity linking for Tweets. In: Meeting of the Association for Computational Linguistics (2017)
Google Scholar
Liu, Y., Li, H., Garcia-Duran, A., Niepert, M., Onoro-Rubio, D., Rosenblum, D.S.: MMKG: multi-modal knowledge graphs. In: Hitzler, P., Hitzler, P., et al. (eds.) ESWC 2019. LNCS, vol. 11503, pp. 459–474. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21348-0_30
Chapter Google Scholar
Ma, C., Sha, Y., Tan, J., Guo, L., Peng, H.: Chinese social media entity linking based on effective context with topic semantics. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 386–395. IEEE (2019)
Google Scholar
Mihalcea, R., Csomai, A.: Wikify! linking documents to encyclopedic knowledge. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, pp. 233–242 (2007)
Google Scholar
Moon, S., Neves, L., Carvalho, V.: Multimodal named entity disambiguation for noisy social media posts. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2000–2008 (2018)
Google Scholar
Mousselly-Sergieh, H., Botschen, T., Gurevych, I., Roth, S.: A multimodal translation-based approach for knowledge graph representation learning. In: Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics, pp. 225–234 (2018)
Google Scholar
Nguyen, T.H., Fauceglia, N.R., Muro, M.R., Hassanzadeh, O., Gliozzo, A., Sadoghi, M.: Joint learning of local and global features for entity linking via neural networks. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2310–2320 (2016)
Google Scholar
Pezeshkpour, P., Chen, L., Singh, S.: Embedding multimodal relational data for knowledge base completion. arXiv preprint arXiv:1809.01341 (2018)
Shen, W., Wang, J., Han, J.: Entity linking with a knowledge base: issues, techniques, and solutions. IEEE Trans. Knowl. Data Eng. 27(2), 443–460 (2014)
Article Google Scholar
Shen, W., Wang, J., Luo, P., Wang, M.: Linking named entities in tweets with knowledge base via user interest modeling. In: ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2013)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Google Scholar
Tao, Z., Wei, Y., Wang, X., He, X., Huang, X., Chua, T.S.: MGAT: multimodal graph attention network for recommendation. Inf. Process. Manage. 57(5), 102277 (2020)
Google Scholar
Yang, Z., Zheng, B., Li, G., Zhao, X., Zhou, X., Jensen, C.S.: Adaptive top-k overlap set similarity joins. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 1081–1092. IEEE (2020)
Google Scholar
Yen, A.Z., Huang, H.H., Chen, H.H.: Multimodal joint learning for personal knowledge base construction from Twitter-based lifelogs. Inf. Process. Manage. 57(6), 102148 (2019)
Google Scholar
Yin, X., Huang, Y., Zhou, B., Li, A., Lan, L., Jia, Y.: Deep entity linking via eliminating semantic ambiguity with BERT. IEEE Access 7, 169434–169445 (2019)
Article Google Scholar
Zheng, B., et al.: Online trichromatic pickup and delivery scheduling in spatial crowdsourcing. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 973–984. IEEE (2020)
Google Scholar
Zheng, B., Zhao, X., Weng, L., Hung, N.Q.V., Liu, H., Jensen, C.S.: PM-LSH: a fast and accurate lSH framework for high-dimensional approximate NN search. Proceedings of the VLDB Endow. 13(5), 643–655 (2020)
Article Google Scholar
Zheng, B., et al.: Answering why-not group spatial keyword queries. IEEE Trans. Knowl. Data Eng. 32(1), 26–39 (2018)
Article Google Scholar
Zhu, Y., Zhang, C., Ré, C., Fei-Fei, L.: Building a large-scale multimodal knowledge base system for answering visual queries. arXiv preprint arXiv:1507.05670 (2015)

Download references

Acknowledgments

This research is supported by National Key R&D Program of China (No. 2018-AAA0101900), the Priority Academic Program Development of Jiangsu Higher Education Institutions, National Natural Science Foundation of China (Grant No. 62072323, 61632016), Natural Science Foundation of Jiangsu Province (No. BK20191420), and the Suda-Toycloud Data Intelligence Joint Laboratory.

Author information

Authors and Affiliations

School of Computer Science and Technology, Soochow University, Suzhou, China
Li Zhang & Zhixu Li
IFLYTEK Research, Suzhou, China
Zhixu Li
King Abdullah University of Science and Technology, Jeddah, Saudi Arabia
Qiang Yang

Authors

Li Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhixu Li
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhixu Li .

Editor information

Editors and Affiliations

Aalborg University, Aalborg, Denmark
Christian S. Jensen
Singapore Management University, Singapore, Singapore
Ee-Peng Lim
Academia Sinica, Taipei, Taiwan
De-Nian Yang
The Pennsylvania State University, University Park, PA, USA
Wang-Chien Lee
National Chiao Tung University, Hsinchu, Taiwan
Vincent S. Tseng
Athens University of Economics and Business, Athens, Greece
Vana Kalogeraki
National Cheng Kung University, Tainan City, Taiwan
Jen-Wei Huang
National Tsing Hua University, Hsinchu, Taiwan
Chih-Ya Shen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, L., Li, Z., Yang, Q. (2021). Attention-Based Multimodal Entity Linking with High-Quality Images. In: Jensen, C.S., et al. Database Systems for Advanced Applications. DASFAA 2021. Lecture Notes in Computer Science(), vol 12682. Springer, Cham. https://doi.org/10.1007/978-3-030-73197-7_35

Download citation

DOI: https://doi.org/10.1007/978-3-030-73197-7_35
Published: 06 April 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73196-0
Online ISBN: 978-3-030-73197-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics