ABSTRACT
Given a set of automatically extracted entities E of size n, we would like to cluster all the various names referring to the same canonical entity together. The variations of each entity include acronyms, full name, and informal naming conventions. We propose using search engine results to cluster variations of each entity based on the URLs appearing in those results. We create a cluster C for each top search result returned by querying for the entity e ∈ E assigning e to the cluster C. Our experiments on a manually created dataset shows that our approach achieves higher precision and recall than string matching algorithm and hierarchical clustering based disambiguation methods.
- Bing blog on navigational queries. http: //www.bing.com/community/site_blogs/b/search/archive/2011/02/10/making-search-yours.aspx, Feb 2011.Google Scholar
- J. Artiles, J. Gonzalo, and S. Sekine. The semeval-2007 weps evaluation: Establishing a benchmark for the web people search task. Proceedings of Semeval, pages 64--69, 2007. Google ScholarDigital Library
- H. Han, H. Zha, and C. Giles. Name disambiguation in author citations using a k-way spectral clustering method. In Digital Libraries, 2005. JCDL'05. Proceedings of the 5th ACM/IEEE-CS Joint Conference on, pages 334--343. IEEE, 2005.H. Han, H. Zha, and C. Giles. Name disambiguation in author citations using a k-way spectral clustering method. In Digital Libraries, 2005. JCDL'05. Proceedings of the 5th ACM/IEEE-CS Join Conference on, pages 334--343. IEEE, 2005. Google ScholarDigital Library
- M. Khabsa, P. Treeratpituk, and C. Giles. Ackseer: a repository and search engine for automatically extracted acknowledgments from digital libraries. In Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries, pages 185--194. ACM, 2012. Google ScholarDigital Library
- T. Liu. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval, 3(3):225--331, 2009. Google ScholarDigital Library
- G. Mann and D. Yarowsky. Unsupervised personal name disambiguation. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4, pages 33--40. Google ScholarDigital Library
- G. Mann and D. Yarowsky. Unsupervised personal name disambiguation. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4, pages 33--40. Google ScholarDigital Library
- G. Mann and D. Yarowsky. Unsupervised personal name disambiguation. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4, pages 33--40. Google ScholarDigital Library
- J. Pustejovsky, J. Castano, B. Cochran, M. Kotecki, and M. Morrell. Automatic extraction of acronym-meaning pairs from medline databases. Studies in health technology and informatics, (1):371--375, 2001.Google Scholar
- Y. F. Tan, M. Y. Kan, and D. Lee. Search engine driven author disambiguation. In Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, JCDL '06, pages 314--315. ACM, 2006. Google ScholarDigital Library
- N. Wacholder, Y. Ravin, and M. Choi. Disambiguation of proper names in text. In Proceedings of the fifth conference on Applied natural language processing, pages 202--208. Association for Computational Linguistics, 1997. Google ScholarDigital Library
Index Terms
- Entity resolution using search engine results
Recommendations
Entity disambiguation with hierarchical topic models
KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data miningDisambiguating entity references by annotating them with unique ids from a catalog is a critical step in the enrichment of unstructured content. In this paper, we show that topic models, such as Latent Dirichlet Allocation (LDA) and its hierarchical ...
Evaluating Entity Linking with Wikipedia
Named Entity Linking (nel) grounds entity mentions to their corresponding node in a Knowledge Base (kb). Recently, a number of systems have been proposed for linking entity mentions in text to Wikipedia pages. Such systems typically search for candidate ...
WEST: Modern Technologies for Web People Search
ICDE '09: Proceedings of the 2009 IEEE International Conference on Data EngineeringIn this paper we describe WEST (Web Entity Search Technologies) system that we have developed to improve people search over the Internet. Recently the problem of Web People Search (WePS) has attracted significant attention from both the industry and ...
Comments