ABSTRACT
Named entity disambiguation is the task of disambiguating named entity mentions in natural language text and link them to their corresponding entries in a knowledge base such as Wikipedia. Such disambiguation can help enhance readability and add semantics to plain text. It is also a central step in constructing high-quality information network or knowledge graph from unstructured text. Previous research has tackled this problem by making use of various textual and structural features from a knowledge base. Most of the proposed algorithms assume that a knowledge base can provide enough explicit and useful information to help disambiguate a mention to the right entity. However, the existing knowledge bases are rarely complete (likely will never be), thus leading to poor performance on short queries with not well-known contexts. In such cases, we need to collect additional evidences scattered in internal and external corpus to augment the knowledge bases and enhance their disambiguation power. In this work, we propose a generative model and an incremental algorithm to automatically mine useful evidences across documents. With a specific modeling of "background topic" and "unknown entities", our model is able to harvest useful evidences out of noisy information. Experimental results show that our proposed method outperforms the state-of-the-art approaches significantly: boosting the disambiguation accuracy from 43% (baseline) to 86% on short queries derived from tweets.
- S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. Dbpedia: A nucleus for a web of open data. The Semantic Web, pages 722--735, 2007. Google ScholarDigital Library
- I. Bhattacharya and L. Getoor. A latent dirichlet model for unsupervised entity resolution. pages 509--518, 2006.Google Scholar
- D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003. Google ScholarDigital Library
- K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of SIGMOD, pages 1247--1250, 2008. Google ScholarCross Ref
- R. Bunescu and M. Pasca. Using encyclopedic knowledge for named entity disambiguation. In Proceedings of EACL, pages 9--16, 2006.Google Scholar
- A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. Hruschka Jr, and T. Mitchell. Toward an architecture for never-ending language learning. In Proceedings of AAAI, pages 1306--1313, 2010.Google Scholar
- C. Chemudugunta and P. Steyvers. Modeling general and specific aspects of documents with a probabilistic topic model. In Proceedings of NIPS, pages 241--248, 2007.Google Scholar
- Z. Chen and H. Ji. Collaborative ranking: A case study on entity linking. In Proceedings of EMNLP, pages 771--781, 2011. Google ScholarDigital Library
- R. Cilibrasi and P. Vitanyi. The google similarity distance. IEEE Transactions on Knowledge and Data Engineering, 19(3):370--383, 2007. Google ScholarDigital Library
- S. Cucerzan. Large-scale named entity disambiguation based on wikipedia data. In Proceedings of EMNLP-CoNLL, pages 708--716, 2007.Google Scholar
- M. Dredze, P. McNamee, D. Rao, A. Gerber, and T. Finin. Entity disambiguation for knowledge base population. In Proceedings of ICCL, pages 277--285, 2010. Google ScholarDigital Library
- A. Fader, S. Soderland, and O. Etzioni. Identifying relations for open information extraction. In Proceedings of EMNLP, pages 1535--1545, 2011. Google ScholarDigital Library
- P. Ferragina and U. Scaiella. Tagme: on-the-fly annotation of short text fragments (by wikipedia entities). In Proceedings of CIKM, pages 1625--1628, 2010. Google ScholarDigital Library
- S. Gottipati and J. Jiang. Linking entities to a knowledge base with query expansion. In Proceedings of EMNLP, pages 804--813, 2011. Google ScholarDigital Library
- A. Haghighi and L. Vanderwende. Exploring content models for multi-document summarization. In Proceedings of ACL-HLT, pages 362--370, 2009. Google ScholarDigital Library
- X. Han and L. Sun. A generative entity-mention model for linking entities with knowledge base. In Proceedings of ACL-HLT, pages 945--954, 2011. Google ScholarDigital Library
- X. Han and L. Sun. An entity-topic model for entity linking. In Proceedings of EMNLP, pages 105--115, 2012. Google ScholarDigital Library
- X. Han, L. Sun, and J. Zhao. Collective entity linking in web text: a graph-based method. In Proceedings of SIGIR, pages 765--774, 2011. Google ScholarDigital Library
- J. Hoffart, M. Yosef, I. Bordino, H. Fürstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum. Robust disambiguation of named entities in text. In Proceedings of EMNLP, pages 782--792, 2011. Google ScholarDigital Library
- H. Ji and R. Grishman. Knowledge base population: Successful approaches and challenges. In Proceedings of ACL, pages 1148--1158, 2011. Google ScholarDigital Library
- S. Kataria, K. Kumar, R. Rastogi, P. Sen, and S. Sengamedu. Entity disambiguation with hierarchical topic models. In Proceedings of SIGKDD, pages 1037--1045, 2011. Google ScholarDigital Library
- D. Milne and I. Witten. Learning to link with wikipedia. In Proceedings of CIKM, pages 509--518, 2008. Google ScholarDigital Library
- D. Ramage, D. Hall, R. Nallapati, and C. Manning. Labeled lda: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings EMNLP, pages 248--256, 2009. Google ScholarDigital Library
- L. Ratinov, D. Roth, D. Downey, and M. Anderson. Local and global algorithms for disambiguation to wikipedia. In Proceedings of ACL, pages 1375--1384, 2011. Google ScholarDigital Library
- P. Sen. Collective context-aware topic models for entity disambiguation. In Proceedings of WWW, pages 729--738, 2012. Google ScholarDigital Library
- W. Shen, J. Wang, P. Luo, and M. Wang. Linden: linking named entities with knowledge base via semantic knowledge. In Proceedings of WWW, pages 449--458, 2012. Google ScholarDigital Library
- F. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semantic knowledge. In Proceedings of WWW, pages 697--706, 2007. Google ScholarDigital Library
- W. Zhang, Y. Sim, J. Su, and C. Tan. Entity linking with effective acronym expansion, instance selection and topic modeling. In Proceedings of IJCAI, pages 1909--1914, 2011. Google ScholarDigital Library
Index Terms
- Mining evidences for named entity disambiguation
Recommendations
Entity Disambiguation with Linkless Knowledge Bases
WWW '16: Proceedings of the 25th International Conference on World Wide WebNamed Entity Disambiguation is the task of disambiguating named entity mentions in natural language text and link them to their corresponding entries in a reference knowledge base (e.g. Wikipedia). Such disambiguation can help add semantics to plain ...
Exploring entity relations for named entity disambiguation
HLT-SS '11: Proceedings of the ACL 2011 Student SessionNamed entity disambiguation is the task of linking an entity mention in a text to the correct real-world referent predefined in a knowledge base, and is a crucial subtask in many areas like information retrieval or topic detection and tracking. Named ...
Location-Aware Named Entity Disambiguation
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge ManagementNamed Entity Disambiguation (NED) and linking has been traditionally evaluated on natural language content that is both well-written and contextually rich. However, many NED approaches display poor performance on text sources that are short and noisy. ...
Comments