skip to main content
10.1145/2396761.2398641acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

Entity resolution using search engine results

Authors Info & Claims
Published:29 October 2012Publication History

ABSTRACT

Given a set of automatically extracted entities E of size n, we would like to cluster all the various names referring to the same canonical entity together. The variations of each entity include acronyms, full name, and informal naming conventions. We propose using search engine results to cluster variations of each entity based on the URLs appearing in those results. We create a cluster C for each top search result returned by querying for the entity eE assigning e to the cluster C. Our experiments on a manually created dataset shows that our approach achieves higher precision and recall than string matching algorithm and hierarchical clustering based disambiguation methods.

References

  1. Bing blog on navigational queries. http: //www.bing.com/community/site_blogs/b/search/archive/2011/02/10/making-search-yours.aspx, Feb 2011.Google ScholarGoogle Scholar
  2. J. Artiles, J. Gonzalo, and S. Sekine. The semeval-2007 weps evaluation: Establishing a benchmark for the web people search task. Proceedings of Semeval, pages 64--69, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. H. Han, H. Zha, and C. Giles. Name disambiguation in author citations using a k-way spectral clustering method. In Digital Libraries, 2005. JCDL'05. Proceedings of the 5th ACM/IEEE-CS Joint Conference on, pages 334--343. IEEE, 2005.H. Han, H. Zha, and C. Giles. Name disambiguation in author citations using a k-way spectral clustering method. In Digital Libraries, 2005. JCDL'05. Proceedings of the 5th ACM/IEEE-CS Join Conference on, pages 334--343. IEEE, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Khabsa, P. Treeratpituk, and C. Giles. Ackseer: a repository and search engine for automatically extracted acknowledgments from digital libraries. In Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries, pages 185--194. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. Liu. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval, 3(3):225--331, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. Mann and D. Yarowsky. Unsupervised personal name disambiguation. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4, pages 33--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Mann and D. Yarowsky. Unsupervised personal name disambiguation. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4, pages 33--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Mann and D. Yarowsky. Unsupervised personal name disambiguation. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4, pages 33--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Pustejovsky, J. Castano, B. Cochran, M. Kotecki, and M. Morrell. Automatic extraction of acronym-meaning pairs from medline databases. Studies in health technology and informatics, (1):371--375, 2001.Google ScholarGoogle Scholar
  10. Y. F. Tan, M. Y. Kan, and D. Lee. Search engine driven author disambiguation. In Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, JCDL '06, pages 314--315. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. N. Wacholder, Y. Ravin, and M. Choi. Disambiguation of proper names in text. In Proceedings of the fifth conference on Applied natural language processing, pages 202--208. Association for Computational Linguistics, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Entity resolution using search engine results

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
          October 2012
          2840 pages
          ISBN:9781450311564
          DOI:10.1145/2396761

          Copyright © 2012 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 29 October 2012

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • poster

          Acceptance Rates

          Overall Acceptance Rate1,861of8,427submissions,22%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader