Abstract
Multimodal entity linking (MEL) is an emerging research field which uses both textual and visual information to map an ambiguous mention to an entity in a knowledge base (KB). However, images do not always help, which may also backfire if they are irrelevant to the textual content at all. Besides, the existing efforts mainly focus on learning a representation of both mentions and entities from their textual and visual contexts, without considering the negative impact brought by noisy irrelevant images, which happens frequently with social media posts. In this paper, we propose a novel MEL model, which not only removes the negative impact of noisy images, but also uses multiple attention mechanism to better capture the connection between mention representation and its corresponding entity representation. Our empirical study on a large real data collection demonstrates the effectiveness of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Adjali, O., Besançon, R., Ferret, O., Le Borgne, H., Grau, B.: Multimodal entity linking for Tweets. In: Jose, J.M., et al. (eds.) ECIR 2020, Part I. LNCS, vol. 12035, pp. 463–478. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45439-5_31
Cheng, J., et al.: Entity linking for Chinese short texts based on BERT and entity name embeddings. In: China Conference on Knowledge Graph and Semantic Computing (CCKS) (2019). https://conference.bj.bcebos.com/ccks2019/eval/webpage/pdfs/eval_paper_2_1.pdf
Chong, W.-H., Lim, E.-P., Cohen, W.: Collective entity linking in Tweets over space and time. In: Jose, J.M., Hauff, C., Altıngovde, I.S., Song, D., Albakour, D., Watt, S., Tait, J. (eds.) ECIR 2017. LNCS, vol. 10193, pp. 82–94. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56608-5_7
Csomai, A., Mihalcea, R.: Linking documents to encyclopedic knowledge. IEEE Intell. Syst. 23(5), 34–41 (2008)
Cucerzan, S.: Large-scale named entity disambiguation based on Wikipedia data. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 708–716 (2007)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dredze, M., Andrews, N., Deyoung, J.: Twitter at the Grammys: a social media corpus for entity linking and disambiguation. In: International Workshop on Natural Language Processing for Social Media (2016)
Fang, Y., Chang, M.W.: Entity linking on microblogs with spatial and temporal signals. Trans. Assoc. Comput. Linguist. 2, 259–272 (2014)
Hua, W., Zheng, K., Zhou, X.: Microblog entity linking with social temporal context, pp. 1761–1775 (2015)
Huang, D., Wang, J.: An approach on Chinese microblog entity linking combining Baidu Encyclopaedia and word2vec. Procedia Comput. Sci. 111, 37–45 (2017)
Liu, X., Li, Y., Wu, H., Ming, Z., Yi, L.: Entity linking for Tweets. In: Meeting of the Association for Computational Linguistics (2017)
Liu, Y., Li, H., Garcia-Duran, A., Niepert, M., Onoro-Rubio, D., Rosenblum, D.S.: MMKG: multi-modal knowledge graphs. In: Hitzler, P., Hitzler, P., et al. (eds.) ESWC 2019. LNCS, vol. 11503, pp. 459–474. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21348-0_30
Ma, C., Sha, Y., Tan, J., Guo, L., Peng, H.: Chinese social media entity linking based on effective context with topic semantics. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 386–395. IEEE (2019)
Mihalcea, R., Csomai, A.: Wikify! linking documents to encyclopedic knowledge. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, pp. 233–242 (2007)
Moon, S., Neves, L., Carvalho, V.: Multimodal named entity disambiguation for noisy social media posts. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2000–2008 (2018)
Mousselly-Sergieh, H., Botschen, T., Gurevych, I., Roth, S.: A multimodal translation-based approach for knowledge graph representation learning. In: Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics, pp. 225–234 (2018)
Nguyen, T.H., Fauceglia, N.R., Muro, M.R., Hassanzadeh, O., Gliozzo, A., Sadoghi, M.: Joint learning of local and global features for entity linking via neural networks. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2310–2320 (2016)
Pezeshkpour, P., Chen, L., Singh, S.: Embedding multimodal relational data for knowledge base completion. arXiv preprint arXiv:1809.01341 (2018)
Shen, W., Wang, J., Han, J.: Entity linking with a knowledge base: issues, techniques, and solutions. IEEE Trans. Knowl. Data Eng. 27(2), 443–460 (2014)
Shen, W., Wang, J., Luo, P., Wang, M.: Linking named entities in tweets with knowledge base via user interest modeling. In: ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2013)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Tao, Z., Wei, Y., Wang, X., He, X., Huang, X., Chua, T.S.: MGAT: multimodal graph attention network for recommendation. Inf. Process. Manage. 57(5), 102277 (2020)
Yang, Z., Zheng, B., Li, G., Zhao, X., Zhou, X., Jensen, C.S.: Adaptive top-k overlap set similarity joins. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 1081–1092. IEEE (2020)
Yen, A.Z., Huang, H.H., Chen, H.H.: Multimodal joint learning for personal knowledge base construction from Twitter-based lifelogs. Inf. Process. Manage. 57(6), 102148 (2019)
Yin, X., Huang, Y., Zhou, B., Li, A., Lan, L., Jia, Y.: Deep entity linking via eliminating semantic ambiguity with BERT. IEEE Access 7, 169434–169445 (2019)
Zheng, B., et al.: Online trichromatic pickup and delivery scheduling in spatial crowdsourcing. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 973–984. IEEE (2020)
Zheng, B., Zhao, X., Weng, L., Hung, N.Q.V., Liu, H., Jensen, C.S.: PM-LSH: a fast and accurate lSH framework for high-dimensional approximate NN search. Proceedings of the VLDB Endow. 13(5), 643–655 (2020)
Zheng, B., et al.: Answering why-not group spatial keyword queries. IEEE Trans. Knowl. Data Eng. 32(1), 26–39 (2018)
Zhu, Y., Zhang, C., Ré, C., Fei-Fei, L.: Building a large-scale multimodal knowledge base system for answering visual queries. arXiv preprint arXiv:1507.05670 (2015)
Acknowledgments
This research is supported by National Key R&D Program of China (No. 2018-AAA0101900), the Priority Academic Program Development of Jiangsu Higher Education Institutions, National Natural Science Foundation of China (Grant No. 62072323, 61632016), Natural Science Foundation of Jiangsu Province (No. BK20191420), and the Suda-Toycloud Data Intelligence Joint Laboratory.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, L., Li, Z., Yang, Q. (2021). Attention-Based Multimodal Entity Linking with High-Quality Images. In: Jensen, C.S., et al. Database Systems for Advanced Applications. DASFAA 2021. Lecture Notes in Computer Science(), vol 12682. Springer, Cham. https://doi.org/10.1007/978-3-030-73197-7_35
Download citation
DOI: https://doi.org/10.1007/978-3-030-73197-7_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73196-0
Online ISBN: 978-3-030-73197-7
eBook Packages: Computer ScienceComputer Science (R0)