skip to main content
10.1145/2487575.2487681acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
poster

Mining evidences for named entity disambiguation

Authors Info & Claims
Published:11 August 2013Publication History

ABSTRACT

Named entity disambiguation is the task of disambiguating named entity mentions in natural language text and link them to their corresponding entries in a knowledge base such as Wikipedia. Such disambiguation can help enhance readability and add semantics to plain text. It is also a central step in constructing high-quality information network or knowledge graph from unstructured text. Previous research has tackled this problem by making use of various textual and structural features from a knowledge base. Most of the proposed algorithms assume that a knowledge base can provide enough explicit and useful information to help disambiguate a mention to the right entity. However, the existing knowledge bases are rarely complete (likely will never be), thus leading to poor performance on short queries with not well-known contexts. In such cases, we need to collect additional evidences scattered in internal and external corpus to augment the knowledge bases and enhance their disambiguation power. In this work, we propose a generative model and an incremental algorithm to automatically mine useful evidences across documents. With a specific modeling of "background topic" and "unknown entities", our model is able to harvest useful evidences out of noisy information. Experimental results show that our proposed method outperforms the state-of-the-art approaches significantly: boosting the disambiguation accuracy from 43% (baseline) to 86% on short queries derived from tweets.

References

  1. S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. Dbpedia: A nucleus for a web of open data. The Semantic Web, pages 722--735, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. I. Bhattacharya and L. Getoor. A latent dirichlet model for unsupervised entity resolution. pages 509--518, 2006.Google ScholarGoogle Scholar
  3. D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of SIGMOD, pages 1247--1250, 2008. Google ScholarGoogle ScholarCross RefCross Ref
  5. R. Bunescu and M. Pasca. Using encyclopedic knowledge for named entity disambiguation. In Proceedings of EACL, pages 9--16, 2006.Google ScholarGoogle Scholar
  6. A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. Hruschka Jr, and T. Mitchell. Toward an architecture for never-ending language learning. In Proceedings of AAAI, pages 1306--1313, 2010.Google ScholarGoogle Scholar
  7. C. Chemudugunta and P. Steyvers. Modeling general and specific aspects of documents with a probabilistic topic model. In Proceedings of NIPS, pages 241--248, 2007.Google ScholarGoogle Scholar
  8. Z. Chen and H. Ji. Collaborative ranking: A case study on entity linking. In Proceedings of EMNLP, pages 771--781, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Cilibrasi and P. Vitanyi. The google similarity distance. IEEE Transactions on Knowledge and Data Engineering, 19(3):370--383, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Cucerzan. Large-scale named entity disambiguation based on wikipedia data. In Proceedings of EMNLP-CoNLL, pages 708--716, 2007.Google ScholarGoogle Scholar
  11. M. Dredze, P. McNamee, D. Rao, A. Gerber, and T. Finin. Entity disambiguation for knowledge base population. In Proceedings of ICCL, pages 277--285, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Fader, S. Soderland, and O. Etzioni. Identifying relations for open information extraction. In Proceedings of EMNLP, pages 1535--1545, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Ferragina and U. Scaiella. Tagme: on-the-fly annotation of short text fragments (by wikipedia entities). In Proceedings of CIKM, pages 1625--1628, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Gottipati and J. Jiang. Linking entities to a knowledge base with query expansion. In Proceedings of EMNLP, pages 804--813, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Haghighi and L. Vanderwende. Exploring content models for multi-document summarization. In Proceedings of ACL-HLT, pages 362--370, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. X. Han and L. Sun. A generative entity-mention model for linking entities with knowledge base. In Proceedings of ACL-HLT, pages 945--954, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. X. Han and L. Sun. An entity-topic model for entity linking. In Proceedings of EMNLP, pages 105--115, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. X. Han, L. Sun, and J. Zhao. Collective entity linking in web text: a graph-based method. In Proceedings of SIGIR, pages 765--774, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Hoffart, M. Yosef, I. Bordino, H. Fürstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum. Robust disambiguation of named entities in text. In Proceedings of EMNLP, pages 782--792, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Ji and R. Grishman. Knowledge base population: Successful approaches and challenges. In Proceedings of ACL, pages 1148--1158, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Kataria, K. Kumar, R. Rastogi, P. Sen, and S. Sengamedu. Entity disambiguation with hierarchical topic models. In Proceedings of SIGKDD, pages 1037--1045, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. Milne and I. Witten. Learning to link with wikipedia. In Proceedings of CIKM, pages 509--518, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. Ramage, D. Hall, R. Nallapati, and C. Manning. Labeled lda: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings EMNLP, pages 248--256, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. L. Ratinov, D. Roth, D. Downey, and M. Anderson. Local and global algorithms for disambiguation to wikipedia. In Proceedings of ACL, pages 1375--1384, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. P. Sen. Collective context-aware topic models for entity disambiguation. In Proceedings of WWW, pages 729--738, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. W. Shen, J. Wang, P. Luo, and M. Wang. Linden: linking named entities with knowledge base via semantic knowledge. In Proceedings of WWW, pages 449--458, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. F. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semantic knowledge. In Proceedings of WWW, pages 697--706, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. W. Zhang, Y. Sim, J. Su, and C. Tan. Entity linking with effective acronym expansion, instance selection and topic modeling. In Proceedings of IJCAI, pages 1909--1914, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Mining evidences for named entity disambiguation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader