poster

Mining evidences for named entity disambiguation

Authors:

Xifeng YanAuthors Info & Claims

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 1070 - 1078

https://doi.org/10.1145/2487575.2487681

Published: 11 August 2013 Publication History

Abstract

Named entity disambiguation is the task of disambiguating named entity mentions in natural language text and link them to their corresponding entries in a knowledge base such as Wikipedia. Such disambiguation can help enhance readability and add semantics to plain text. It is also a central step in constructing high-quality information network or knowledge graph from unstructured text. Previous research has tackled this problem by making use of various textual and structural features from a knowledge base. Most of the proposed algorithms assume that a knowledge base can provide enough explicit and useful information to help disambiguate a mention to the right entity. However, the existing knowledge bases are rarely complete (likely will never be), thus leading to poor performance on short queries with not well-known contexts. In such cases, we need to collect additional evidences scattered in internal and external corpus to augment the knowledge bases and enhance their disambiguation power. In this work, we propose a generative model and an incremental algorithm to automatically mine useful evidences across documents. With a specific modeling of "background topic" and "unknown entities", our model is able to harvest useful evidences out of noisy information. Experimental results show that our proposed method outperforms the state-of-the-art approaches significantly: boosting the disambiguation accuracy from 43% (baseline) to 86% on short queries derived from tweets.

References

[1]

S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. Dbpedia: A nucleus for a web of open data. The Semantic Web, pages 722--735, 2007.

Digital Library

[2]

I. Bhattacharya and L. Getoor. A latent dirichlet model for unsupervised entity resolution. pages 509--518, 2006.

[3]

D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003.

Digital Library

[4]

K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of SIGMOD, pages 1247--1250, 2008.

[5]

R. Bunescu and M. Pasca. Using encyclopedic knowledge for named entity disambiguation. In Proceedings of EACL, pages 9--16, 2006.

[6]

A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. Hruschka Jr, and T. Mitchell. Toward an architecture for never-ending language learning. In Proceedings of AAAI, pages 1306--1313, 2010.

[7]

C. Chemudugunta and P. Steyvers. Modeling general and specific aspects of documents with a probabilistic topic model. In Proceedings of NIPS, pages 241--248, 2007.

[8]

Z. Chen and H. Ji. Collaborative ranking: A case study on entity linking. In Proceedings of EMNLP, pages 771--781, 2011.

Digital Library

[9]

R. Cilibrasi and P. Vitanyi. The google similarity distance. IEEE Transactions on Knowledge and Data Engineering, 19(3):370--383, 2007.

Digital Library

[10]

S. Cucerzan. Large-scale named entity disambiguation based on wikipedia data. In Proceedings of EMNLP-CoNLL, pages 708--716, 2007.

[11]

M. Dredze, P. McNamee, D. Rao, A. Gerber, and T. Finin. Entity disambiguation for knowledge base population. In Proceedings of ICCL, pages 277--285, 2010.

Digital Library

[12]

A. Fader, S. Soderland, and O. Etzioni. Identifying relations for open information extraction. In Proceedings of EMNLP, pages 1535--1545, 2011.

Digital Library

[13]

P. Ferragina and U. Scaiella. Tagme: on-the-fly annotation of short text fragments (by wikipedia entities). In Proceedings of CIKM, pages 1625--1628, 2010.

Digital Library

[14]

S. Gottipati and J. Jiang. Linking entities to a knowledge base with query expansion. In Proceedings of EMNLP, pages 804--813, 2011.

Digital Library

[15]

A. Haghighi and L. Vanderwende. Exploring content models for multi-document summarization. In Proceedings of ACL-HLT, pages 362--370, 2009.

Digital Library

[16]

X. Han and L. Sun. A generative entity-mention model for linking entities with knowledge base. In Proceedings of ACL-HLT, pages 945--954, 2011.

Digital Library

[17]

X. Han and L. Sun. An entity-topic model for entity linking. In Proceedings of EMNLP, pages 105--115, 2012.

Digital Library

[18]

X. Han, L. Sun, and J. Zhao. Collective entity linking in web text: a graph-based method. In Proceedings of SIGIR, pages 765--774, 2011.

Digital Library

[19]

J. Hoffart, M. Yosef, I. Bordino, H. Fürstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum. Robust disambiguation of named entities in text. In Proceedings of EMNLP, pages 782--792, 2011.

Digital Library

[20]

H. Ji and R. Grishman. Knowledge base population: Successful approaches and challenges. In Proceedings of ACL, pages 1148--1158, 2011.

Digital Library

[21]

S. Kataria, K. Kumar, R. Rastogi, P. Sen, and S. Sengamedu. Entity disambiguation with hierarchical topic models. In Proceedings of SIGKDD, pages 1037--1045, 2011.

Digital Library

[22]

D. Milne and I. Witten. Learning to link with wikipedia. In Proceedings of CIKM, pages 509--518, 2008.

Digital Library

[23]

D. Ramage, D. Hall, R. Nallapati, and C. Manning. Labeled lda: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings EMNLP, pages 248--256, 2009.

Digital Library

[24]

L. Ratinov, D. Roth, D. Downey, and M. Anderson. Local and global algorithms for disambiguation to wikipedia. In Proceedings of ACL, pages 1375--1384, 2011.

Digital Library

[25]

P. Sen. Collective context-aware topic models for entity disambiguation. In Proceedings of WWW, pages 729--738, 2012.

Digital Library

[26]

W. Shen, J. Wang, P. Luo, and M. Wang. Linden: linking named entities with knowledge base via semantic knowledge. In Proceedings of WWW, pages 449--458, 2012.

Digital Library

[27]

F. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semantic knowledge. In Proceedings of WWW, pages 697--706, 2007.

Digital Library

[28]

W. Zhang, Y. Sim, J. Su, and C. Tan. Entity linking with effective acronym expansion, instance selection and topic modeling. In Proceedings of IJCAI, pages 1909--1914, 2011.

Digital Library

Cited By

Tsai CUpadhyay SRoth DTsai CUpadhyay SRoth D(2025)Linking Mentions to EntitiesMultilingual Entity Linking10.1007/978-3-031-74901-8_6(85-109)Online publication date: 18-Feb-2025
https://doi.org/10.1007/978-3-031-74901-8_6
Shen WWen H(2024)Ambiguous Entity Oriented Targeted Document Detection2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00072(874-886)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00072
Qu JWang JZhao ZChen X(2024)MBJELEL: An End-to-End Knowledge Graph Entity Linking Method Applied to Civil Aviation EmergenciesInternational Journal of Computational Intelligence Systems10.1007/s44196-024-00647-w17:1Online publication date: 10-Sep-2024
https://doi.org/10.1007/s44196-024-00647-w
Show More Cited By

Index Terms

Mining evidences for named entity disambiguation
1. Applied computing
  1. Document management and text processing

Recommendations

Entity Disambiguation with Linkless Knowledge Bases
WWW '16: Proceedings of the 25th International Conference on World Wide Web

Named Entity Disambiguation is the task of disambiguating named entity mentions in natural language text and link them to their corresponding entries in a reference knowledge base (e.g. Wikipedia). Such disambiguation can help add semantics to plain ...
Exploring entity relations for named entity disambiguation
HLT-SS '11: Proceedings of the ACL 2011 Student Session

Named entity disambiguation is the task of linking an entity mention in a text to the correct real-world referent predefined in a knowledge base, and is a crucial subtask in many areas like information retrieval or topic detection and tracking. Named ...
Location-Aware Named Entity Disambiguation
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Named Entity Disambiguation (NED) and linking has been traditionally evaluated on natural language content that is both well-written and contextually rich. However, many NED approaches display poor performance on text sources that are short and noisy. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

August 2013

1534 pages

ISBN:9781450321747

DOI:10.1145/2487575

Editors:
Rayid Ghani
University of Chicago
,
Ted E. Senator
SAIC
,
Paul Bradley
MethodCare, Inc.
,
Rajesh Parekh
Groupon
,
Jingrui He
Stevens Institute of Technology
,
General Chairs:
Robert L. Grossman
University of Chicago and Open Data Group
,
Ramasamy Uthurusamy
General Motors Corporation (retired)
,
Program Chairs:
Inderjit S. Dhillon
University of Texas
,
Yehuda Koren
Google

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 August 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Poster

Conference

KDD' 13

Sponsor:

KDD' 13: The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 11 - 14, 2013

Illinois, Chicago, USA

Acceptance Rates

KDD '13 Paper Acceptance Rate 125 of 726 submissions, 17%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

71
Total Citations
View Citations
825
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)3

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tsai CUpadhyay SRoth DTsai CUpadhyay SRoth D(2025)Linking Mentions to EntitiesMultilingual Entity Linking10.1007/978-3-031-74901-8_6(85-109)Online publication date: 18-Feb-2025
https://doi.org/10.1007/978-3-031-74901-8_6
Shen WWen H(2024)Ambiguous Entity Oriented Targeted Document Detection2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00072(874-886)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00072
Qu JWang JZhao ZChen X(2024)MBJELEL: An End-to-End Knowledge Graph Entity Linking Method Applied to Civil Aviation EmergenciesInternational Journal of Computational Intelligence Systems10.1007/s44196-024-00647-w17:1Online publication date: 10-Sep-2024
https://doi.org/10.1007/s44196-024-00647-w
Hamdi ALinhares Pontes ESidere NCoustaty MDoucet A(2022)In-depth analysis of the impact of OCR errors on named entity recognition and linkingNatural Language Engineering10.1017/S135132492200011029:2(425-448)Online publication date: 18-Mar-2022
https://doi.org/10.1017/S1351324922000110
Ma ZHu ZShi JLi ZZhou YLiao YYang YGao ZZhang JShao X(2022)A Module Based Full Cycle Construction Method of Domain-Specific Knowledge GraphAdvances in Artificial Intelligence and Security10.1007/978-3-031-06767-9_49(590-603)Online publication date: 8-Jul-2022
https://doi.org/10.1007/978-3-031-06767-9_49
Oliveira IFileto RSpeck RGarcia LMoussallem DLehmann J(2021)Towards holistic Entity Linking: Survey and directionsInformation Systems10.1016/j.is.2020.10162495(101624)Online publication date: Jan-2021
https://doi.org/10.1016/j.is.2020.101624
Senthil Kumar NDinakaran M(2020)An algorithmic approach to rank the disambiguous entities in Twitter streams for effective semantic search operationsSādhanā10.1007/s12046-019-1247-145:1Online publication date: 24-Jan-2020
https://doi.org/10.1007/s12046-019-1247-1
Liao XZhao Z(2019)Unsupervised Approaches for Textual Semantic Annotation, A SurveyACM Computing Surveys10.1145/332447352:4(1-45)Online publication date: 30-Aug-2019
https://dl.acm.org/doi/10.1145/3324473
Emami H(2019)A Graph-based Approach to Person Name Disambiguation in WebACM Transactions on Management Information Systems10.1145/331494910:2(1-25)Online publication date: 17-May-2019
https://dl.acm.org/doi/10.1145/3314949
Song YLi AJia YHuang JZhao X(2019)Knowledge Fusion: Introduction of Concepts and Techniques2019 IEEE Fourth International Conference on Data Science in Cyberspace (DSC)10.1109/DSC.2019.00025(112-118)Online publication date: Jun-2019
https://doi.org/10.1109/DSC.2019.00025
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten