skip to main content
10.1145/2487575.2487681acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
poster

Mining evidences for named entity disambiguation

Published: 11 August 2013 Publication History

Abstract

Named entity disambiguation is the task of disambiguating named entity mentions in natural language text and link them to their corresponding entries in a knowledge base such as Wikipedia. Such disambiguation can help enhance readability and add semantics to plain text. It is also a central step in constructing high-quality information network or knowledge graph from unstructured text. Previous research has tackled this problem by making use of various textual and structural features from a knowledge base. Most of the proposed algorithms assume that a knowledge base can provide enough explicit and useful information to help disambiguate a mention to the right entity. However, the existing knowledge bases are rarely complete (likely will never be), thus leading to poor performance on short queries with not well-known contexts. In such cases, we need to collect additional evidences scattered in internal and external corpus to augment the knowledge bases and enhance their disambiguation power. In this work, we propose a generative model and an incremental algorithm to automatically mine useful evidences across documents. With a specific modeling of "background topic" and "unknown entities", our model is able to harvest useful evidences out of noisy information. Experimental results show that our proposed method outperforms the state-of-the-art approaches significantly: boosting the disambiguation accuracy from 43% (baseline) to 86% on short queries derived from tweets.

References

[1]
S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. Dbpedia: A nucleus for a web of open data. The Semantic Web, pages 722--735, 2007.
[2]
I. Bhattacharya and L. Getoor. A latent dirichlet model for unsupervised entity resolution. pages 509--518, 2006.
[3]
D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003.
[4]
K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of SIGMOD, pages 1247--1250, 2008.
[5]
R. Bunescu and M. Pasca. Using encyclopedic knowledge for named entity disambiguation. In Proceedings of EACL, pages 9--16, 2006.
[6]
A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. Hruschka Jr, and T. Mitchell. Toward an architecture for never-ending language learning. In Proceedings of AAAI, pages 1306--1313, 2010.
[7]
C. Chemudugunta and P. Steyvers. Modeling general and specific aspects of documents with a probabilistic topic model. In Proceedings of NIPS, pages 241--248, 2007.
[8]
Z. Chen and H. Ji. Collaborative ranking: A case study on entity linking. In Proceedings of EMNLP, pages 771--781, 2011.
[9]
R. Cilibrasi and P. Vitanyi. The google similarity distance. IEEE Transactions on Knowledge and Data Engineering, 19(3):370--383, 2007.
[10]
S. Cucerzan. Large-scale named entity disambiguation based on wikipedia data. In Proceedings of EMNLP-CoNLL, pages 708--716, 2007.
[11]
M. Dredze, P. McNamee, D. Rao, A. Gerber, and T. Finin. Entity disambiguation for knowledge base population. In Proceedings of ICCL, pages 277--285, 2010.
[12]
A. Fader, S. Soderland, and O. Etzioni. Identifying relations for open information extraction. In Proceedings of EMNLP, pages 1535--1545, 2011.
[13]
P. Ferragina and U. Scaiella. Tagme: on-the-fly annotation of short text fragments (by wikipedia entities). In Proceedings of CIKM, pages 1625--1628, 2010.
[14]
S. Gottipati and J. Jiang. Linking entities to a knowledge base with query expansion. In Proceedings of EMNLP, pages 804--813, 2011.
[15]
A. Haghighi and L. Vanderwende. Exploring content models for multi-document summarization. In Proceedings of ACL-HLT, pages 362--370, 2009.
[16]
X. Han and L. Sun. A generative entity-mention model for linking entities with knowledge base. In Proceedings of ACL-HLT, pages 945--954, 2011.
[17]
X. Han and L. Sun. An entity-topic model for entity linking. In Proceedings of EMNLP, pages 105--115, 2012.
[18]
X. Han, L. Sun, and J. Zhao. Collective entity linking in web text: a graph-based method. In Proceedings of SIGIR, pages 765--774, 2011.
[19]
J. Hoffart, M. Yosef, I. Bordino, H. Fürstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum. Robust disambiguation of named entities in text. In Proceedings of EMNLP, pages 782--792, 2011.
[20]
H. Ji and R. Grishman. Knowledge base population: Successful approaches and challenges. In Proceedings of ACL, pages 1148--1158, 2011.
[21]
S. Kataria, K. Kumar, R. Rastogi, P. Sen, and S. Sengamedu. Entity disambiguation with hierarchical topic models. In Proceedings of SIGKDD, pages 1037--1045, 2011.
[22]
D. Milne and I. Witten. Learning to link with wikipedia. In Proceedings of CIKM, pages 509--518, 2008.
[23]
D. Ramage, D. Hall, R. Nallapati, and C. Manning. Labeled lda: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings EMNLP, pages 248--256, 2009.
[24]
L. Ratinov, D. Roth, D. Downey, and M. Anderson. Local and global algorithms for disambiguation to wikipedia. In Proceedings of ACL, pages 1375--1384, 2011.
[25]
P. Sen. Collective context-aware topic models for entity disambiguation. In Proceedings of WWW, pages 729--738, 2012.
[26]
W. Shen, J. Wang, P. Luo, and M. Wang. Linden: linking named entities with knowledge base via semantic knowledge. In Proceedings of WWW, pages 449--458, 2012.
[27]
F. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semantic knowledge. In Proceedings of WWW, pages 697--706, 2007.
[28]
W. Zhang, Y. Sim, J. Su, and C. Tan. Entity linking with effective acronym expansion, instance selection and topic modeling. In Proceedings of IJCAI, pages 1909--1914, 2011.

Cited By

View all
  • (2025)Linking Mentions to EntitiesMultilingual Entity Linking10.1007/978-3-031-74901-8_6(85-109)Online publication date: 18-Feb-2025
  • (2024)Ambiguous Entity Oriented Targeted Document Detection2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00072(874-886)Online publication date: 13-May-2024
  • (2024)MBJELEL: An End-to-End Knowledge Graph Entity Linking Method Applied to Civil Aviation EmergenciesInternational Journal of Computational Intelligence Systems10.1007/s44196-024-00647-w17:1Online publication date: 10-Sep-2024
  • Show More Cited By

Index Terms

  1. Mining evidences for named entity disambiguation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2013
    1534 pages
    ISBN:9781450321747
    DOI:10.1145/2487575
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 August 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. entity disambiguation
    2. evidence mining
    3. generative model
    4. knowledge expansion
    5. semi-supervised learning

    Qualifiers

    • Poster

    Conference

    KDD' 13
    Sponsor:

    Acceptance Rates

    KDD '13 Paper Acceptance Rate 125 of 726 submissions, 17%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Linking Mentions to EntitiesMultilingual Entity Linking10.1007/978-3-031-74901-8_6(85-109)Online publication date: 18-Feb-2025
    • (2024)Ambiguous Entity Oriented Targeted Document Detection2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00072(874-886)Online publication date: 13-May-2024
    • (2024)MBJELEL: An End-to-End Knowledge Graph Entity Linking Method Applied to Civil Aviation EmergenciesInternational Journal of Computational Intelligence Systems10.1007/s44196-024-00647-w17:1Online publication date: 10-Sep-2024
    • (2022)In-depth analysis of the impact of OCR errors on named entity recognition and linkingNatural Language Engineering10.1017/S135132492200011029:2(425-448)Online publication date: 18-Mar-2022
    • (2022)A Module Based Full Cycle Construction Method of Domain-Specific Knowledge GraphAdvances in Artificial Intelligence and Security10.1007/978-3-031-06767-9_49(590-603)Online publication date: 8-Jul-2022
    • (2021)Towards holistic Entity Linking: Survey and directionsInformation Systems10.1016/j.is.2020.10162495(101624)Online publication date: Jan-2021
    • (2020)An algorithmic approach to rank the disambiguous entities in Twitter streams for effective semantic search operationsSādhanā10.1007/s12046-019-1247-145:1Online publication date: 24-Jan-2020
    • (2019)Unsupervised Approaches for Textual Semantic Annotation, A SurveyACM Computing Surveys10.1145/332447352:4(1-45)Online publication date: 30-Aug-2019
    • (2019)A Graph-based Approach to Person Name Disambiguation in WebACM Transactions on Management Information Systems10.1145/331494910:2(1-25)Online publication date: 17-May-2019
    • (2019)Knowledge Fusion: Introduction of Concepts and Techniques2019 IEEE Fourth International Conference on Data Science in Cyberspace (DSC)10.1109/DSC.2019.00025(112-118)Online publication date: Jun-2019
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media