Skip to main content
Log in

Enriching Context Information for Entity Linking with Web Data

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Entity linking (EL) is the task of determining the identity of textual entity mentions given a predefined knowledge base (KB). Plenty of existing efforts have been made on this task using either “local” information (contextual information of the mention in the text), or “global” information (relations among candidate entities). However, either local or global information might be insufficient especially when the given text is short. To get richer local and global information for entity linking, we propose to enrich the context information for mentions by getting extra contexts from the web through web search engines (WSE). Based on the intuition above, two novel attempts are made. The first one adds web-searched results into an embedding-based method to expand the mention’s local information, where we try two different methods to help generate high-quality web contexts: one is to apply the attention mechanism and the other is to use the abstract extraction method. The second one uses the web contexts to extend the global information, i.e., finding and utilizing more extra relevant mentions from the web contexts with a graph-based model. Finally, we combine the two models we propose to use both extended local and global information from the extra web contexts. Our empirical study based on six real-world datasets shows that using extra web contexts to extend the local and the global information could effectively improve the F1 score of entity linking.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Yamada I, Shindo H, Takeda H, Takefuji Y. Joint learning of the embedding of words and entities for named entity disambiguation. In Proc. the 20th Conference on Computational Natural Language Learning, August 2016, pp.250-259.

  2. Cai R, Wang H, Zhang J. Learning entity representation for named entity disambiguation. In Proc. the 14th China National Conference on Chinese Computational Linguistics and the 3rd International Symposium on Natural Language Processing Based on Naturally Annotated Big Data, November 2015, pp.267-278.

  3. Blanco R, Ottaviano G, Meij E. Fast and space-efficient entity linking in queries. In Proc. the 8th ACM International Conference on Web Search and Data Mining, February 2015, pp.179-188.

  4. Phan M C, Sun A, Tay Y, Han J, Li C. NeuPL: Attention-based semantic matching and pairlinking for entity disambiguation. In Proc. the 2017 ACM Conference on Information and Knowledge Management, November 2017, pp.1667-676.

  5. Basile P, Caputo A. Entity linking for tweets. Encyclopedia with Semantic Computing and Robotic Intelligence, 2017, 1(1): Article No. 1630020.

  6. Yih W T, Chang M W, He X, Gao J. Semantic parsing via staged query graph generation: Question answering with knowledge base. In Proc. the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, July 2015, pp.1321-1331.

  7. Ji H, Grishman R, Dang H T, Griffitt K, Ellis J. Overview of the TAC 2010 knowledge base population track. In Proc. the 3rd Text Analysis Conference, November 2010.

  8. Mcnamee P, Dang H T. Overview of the TAC 2009 knowledge base population track. In Proc. the 2nd Text Analysis Conference, November 2009, pp.111-113.

  9. Hoffmann R, Zhang C, Ling X, Zettlemoyer L S, Weld D S. Knowledge-based weak supervision for information extraction of overlapping relations. In Proc. the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, June 2011, pp.541-550.

  10. Wang Y, Li Z, Yang Q, Chen Z, Liu A, Liu G, Zhao L. WebEL: Improving entity linking with extra web contexts. In Proc. the 20th International Conference on Web Information Systems Engineering, November 2019, pp.507-522.

  11. Bunescu R C, Pasca M. Using encyclopedic knowledge for named entity disambiguation. In Proc. the 11th Conference of the European Chapter of the Association for Computational Linguistics, April 2006, pp.9-16.

  12. Mihalcea R, Csomai A.Wikify! Linking documents to encyclopedic knowledge. In Proc. the 16th ACM Conference on Information and Knowledge Management, November 2007, pp.233-242.

  13. Fang W, Zhang J, Wang D, Chen Z, Li M. Entity disambiguation by knowledge and text jointly embedding. In Proc. the 20th SIGNLL Conference on Computational Natural Language Learning, August 2016, pp.260-269.

  14. Nguyen T H, Fauceglia N R, Muro M R, Hassanzadeh O, Gliozzo A, Sadoghi M. Joint learning of local and global features for entity linking via neural networks. In Proc. the 26th International Conference on Computational Linguistics, December 2016, pp.2310-2320.

  15. Luo A, Gao S, Xu Y. Deep semantic match model for entity linking using knowledge graph and text. Procedia Computer Science, 2018, 129: 110-114.

    Article  Google Scholar 

  16. Liu C, Li F, Sun X, Han H. Attention-based joint entity linking with entity embedding. Information, 2019, 10(2): Article No. 46.

  17. Cucerzan S. Large-scale named entity disambiguation based on Wikipedia data. In Proc. the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, June 2007, pp.708-716.

  18. Witten I H, Milne D N. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In Proc. the 1st AAAI Workshop on Wikipedia and Artificial Intelligence, July 2008, pp.25-30.

  19. Hoffart J, Yosef M A, Bordino I, Füurstenau H, Pinkal M, Spaniol M, Taneva B, Thater S, Weikum G. Robust disambiguation of named entities in text. In Proc. the Conference on Empirical Methods in Natural Language Processing, July 2011, pp.782-792.

  20. Ratinov L A, Roth D, Downey D, Anderson M. Local and global algorithms for disambiguation to Wikipedia. In Proc. the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, June 2011, pp.1375-1384.

  21. Alhelbawy A, Gaizauskas R. Graph ranking for collective named entity disambiguation. In Proc. the 52nd Annual Meeting of the Association for Computational Linguistics, June 2014, pp.75-80.

  22. Liu M, Chen L, Liu B, Zheng G, Zhang X. DBpedia-based entity linking via greedy search and adjusted Monte Carlo random walk. ACM Transactions on Information Systems, 2017, 36(2): Article No. 16.

  23. Liu G, Wang Y, Orgun M A, Lim E P. Finding the optimal social trust paths for the selection of trustworthy service providers in complex social networks. IEEE Transactions on Services Computing, 2013, 6(2): 152-167.

    Article  Google Scholar 

  24. Liu G, Yan W, Orgun M A. Optimal social trust path selection in complex social networks. In Proc. the 24th AAAI Conference on Artificial Intelligence, July 2010.

  25. Han X, Sun L, Zhao J. Collective entity linking in web text: A graph-based method. In Proc. the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, July 2011, pp.765-774.

  26. Ganea O E, Ganea M, Lucchi A, Eickhoff C, Hofmann T. Probabilistic bag-of-hyperlinks model for entity linking. In Proc. the 25th International Conference on World Wide Web, April 2015, pp.927-938.

  27. Xie T, Wu B, Jia B, Wang B. Graph-ranking collective Chinese entity linking algorithm. Frontiers of Computer Science, 2020, 14(2): 291-303.

    Article  Google Scholar 

  28. Liu M, Zhao Y, Qin B, Liu T. Collective entity linking: A random walk-based perspective. Knowledge and Information Systems, 2019, 60(3): 1611-1643.

    Article  Google Scholar 

  29. Liu F H, Gu L, Gao Y, Picheny M. Use of statistical N-gram models in natural language generation for machine translation. In Proc. the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, April 2003, pp.636-639.

  30. Wallach H M. Topic modeling: Beyond bag-of-words. In Proc. the 23rd International Conference on Machine Learning, June 2006, pp.977-984.

  31. Pauls A, Dan K. Faster and smaller N-gram language models. In Proc. the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, June 2012, pp.258-267.

  32. Lei K, Deng Y, Zhang B, Shen Y. Open domain question answering with character-level deep learning models. In Proc. the 10th International Symposium on Computational Intelligence and Design, December 2018, pp.30-33.

  33. Gang Z, Zong-Min M A, Kan H M, Niu L Q. Texture feature extraction approach using co-occurrence matrix. Journal of Shenyang University of Technology, 2010, 32(2): 192-195. (in Chinese)

    Google Scholar 

  34. Mikolov T, Chen K, Corrado G et al. Efficient estimation of word representations in vector space. In Proc. the 1st International Conference on Learning Representations, May 2013.

  35. Landgraf A J, Bellay J. Word2vec skip-gram with negative sampling is a weighted logistic PCA. arXiv:1705.09755, 2017. https://arxiv.org/abs/1705.09755, March 2020.

  36. Francis-landau M, Durrett G, Klein D. Capturing semantic similarity for entity linking with convolutional neural networks. In Proc. the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 2016, pp.1256-1261.

  37. Sun Y, Lin L, Tang D, Yang N, Ji Z, Wang X. Modeling mention, context and entity with neural networks for entity disambiguation. In Proc. the 24th International Conference on Artificial Intelligence, July 2015, pp.1333-1339.

  38. Shi W, Zhang S, Zhang Z, Cheng H, Yu J X. Joint embedding in named entity linking on sentence level. arXiv:2002.04936, 2020. https://arxiv.org/pdf/2002.04936.pdf, March 2020.

  39. Le P, Titov I. Improving entity linking by modeling latent relations between mentions. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics, July 2018, pp.1595-1604.

  40. Ensan F, Du W. Ad hoc retrieval via entity linking and semantic similarity. Knowledge and Information Systems, 2019, 58(3): 551-583.

    Article  Google Scholar 

  41. Lehmann J, Monahan S, Nezda L, Jung A, Shi Y. LCC approaches to knowledge base population at TAC 2010. In Proc. the 3rd Text Analysis Conference, November 2010.

  42. Monahan S, Lehmann J, Nyberg T, Plymale J, Jung A. Cross-lingual cross-document coreference with entity linking. In Proc. the 4th Text Analysis Conference, November 2011.

  43. Cornolti M, Ferragina P, Ciaramita M, Rüd S, Schütze H. SMAPH: A piggyback approach for entity-linking in web queries. ACM Transactions on Information Systems, 2018, 37(1): Article No. 13.

  44. Trani S, Ceccarelli D, Lucchese C, Orlando S, Perego R. SEL: A unified algorithm for entity linking and saliency detection. In Proc. the 2016 ACM Symposium on Document Engineering, September 2016, pp.85-94.

  45. See A, Liu P J, Manning C D. Get to the point: Summarization with pointer-generator networks. In Proc. the 55th Annual Meeting of the Association for Computational Linguistics, July 2017, pp.1073-1083.

  46. Ganea O E, Hofmann T. Deep joint entity disambiguation with local neural attention. In Proc. the 2017 Conference on Empirical Methods in Natural Language Processing, September 2017, pp.2619-2629.

  47. Cheng X, Roth D. Relational inference for Wikification. In Proc. the 2013 Conference on Empirical Methods in Natural Language Processing, October 2013, pp.1787-1796.

  48. Chisholm A, Hachey B. Entity disambiguation with web links. Transactions of the Association for Computational Linguistics, 2015, 3: 145-156.

    Article  Google Scholar 

  49. Globerson A, Lazic N, Chakrabarti S, Subramanya A, Ringaard M, Pereira F. Collective entity resolution with multi-focal attention. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, August 2016, pp.621-631.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhi-Xu Li.

Electronic supplementary material

ESM 1

(PDF 479 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, YT., Shen, J., Li, ZX. et al. Enriching Context Information for Entity Linking with Web Data. J. Comput. Sci. Technol. 35, 724–738 (2020). https://doi.org/10.1007/s11390-020-0280-1

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-020-0280-1

Keywords

Navigation