poster

Entity resolution using search engine results

Authors:
Madian Khabsa

The Pennsylvania State University, University Park, PA, USA

The Pennsylvania State University, University Park, PA, USA
View Profile

,
Pucktada Treeratpituk

The Pennsylvania State University, University Park, PA, USA

The Pennsylvania State University, University Park, PA, USA
View Profile

,
C. Lee Giles

The Pennsylvania State University, University Park, PA, USA

The Pennsylvania State University, University Park, PA, USA
View Profile

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge managementOctober 2012Pages 2363–2366https://doi.org/10.1145/2396761.2398641

Published:29 October 2012Publication History

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

Pages 2363–2366

ABSTRACT

Given a set of automatically extracted entities E of size n, we would like to cluster all the various names referring to the same canonical entity together. The variations of each entity include acronyms, full name, and informal naming conventions. We propose using search engine results to cluster variations of each entity based on the URLs appearing in those results. We create a cluster C for each top search result returned by querying for the entity e ∈ E assigning e to the cluster C. Our experiments on a manually created dataset shows that our approach achieves higher precision and recall than string matching algorithm and hierarchical clustering based disambiguation methods.

References

Bing blog on navigational queries. http: //www.bing.com/community/site_blogs/b/search/archive/2011/02/10/making-search-yours.aspx, Feb 2011.Google Scholar
J. Artiles, J. Gonzalo, and S. Sekine. The semeval-2007 weps evaluation: Establishing a benchmark for the web people search task. Proceedings of Semeval, pages 64--69, 2007. Google ScholarDigital Library
H. Han, H. Zha, and C. Giles. Name disambiguation in author citations using a k-way spectral clustering method. In Digital Libraries, 2005. JCDL'05. Proceedings of the 5th ACM/IEEE-CS Joint Conference on, pages 334--343. IEEE, 2005.H. Han, H. Zha, and C. Giles. Name disambiguation in author citations using a k-way spectral clustering method. In Digital Libraries, 2005. JCDL'05. Proceedings of the 5th ACM/IEEE-CS Join Conference on, pages 334--343. IEEE, 2005. Google ScholarDigital Library
M. Khabsa, P. Treeratpituk, and C. Giles. Ackseer: a repository and search engine for automatically extracted acknowledgments from digital libraries. In Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries, pages 185--194. ACM, 2012. Google ScholarDigital Library
T. Liu. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval, 3(3):225--331, 2009. Google ScholarDigital Library
G. Mann and D. Yarowsky. Unsupervised personal name disambiguation. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4, pages 33--40. Google ScholarDigital Library
G. Mann and D. Yarowsky. Unsupervised personal name disambiguation. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4, pages 33--40. Google ScholarDigital Library
G. Mann and D. Yarowsky. Unsupervised personal name disambiguation. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4, pages 33--40. Google ScholarDigital Library
J. Pustejovsky, J. Castano, B. Cochran, M. Kotecki, and M. Morrell. Automatic extraction of acronym-meaning pairs from medline databases. Studies in health technology and informatics, (1):371--375, 2001.Google Scholar
Y. F. Tan, M. Y. Kan, and D. Lee. Search engine driven author disambiguation. In Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, JCDL '06, pages 314--315. ACM, 2006. Google ScholarDigital Library
N. Wacholder, Y. Ravin, and M. Choi. Disambiguation of proper names in text. In Proceedings of the fifth conference on Applied natural language processing, pages 202--208. Association for Computational Linguistics, 1997. Google ScholarDigital Library

Index Terms

Entity resolution using search engine results
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification
  2. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

Entity disambiguation with hierarchical topic models
KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

Disambiguating entity references by annotating them with unique ids from a catalog is a critical step in the enrichment of unstructured content. In this paper, we show that topic models, such as Latent Dirichlet Allocation (LDA) and its hierarchical ...
Read More
Evaluating Entity Linking with Wikipedia

Named Entity Linking (nel) grounds entity mentions to their corresponding node in a Knowledge Base (kb). Recently, a number of systems have been proposed for linking entity mentions in text to Wikipedia pages. Such systems typically search for candidate ...
Read More
WEST: Modern Technologies for Web People Search
ICDE '09: Proceedings of the 2009 IEEE International Conference on Data Engineering

In this paper we describe WEST (Web Entity Search Technologies) system that we have developed to improve people search over the Internet. Recently the problem of Web People Search (WePS) has attracted significant attention from both the industry and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
October 2012
2840 pages
ISBN:9781450311564
DOI:10.1145/2396761
General Chair:
Xuewen Chen
Wayne State University, USA
,
Program Chairs:
Guy Lebanon
Georgia Institute of Technology
,
Haixun Wang
Microsoft Research Asia
,
Mohammed J. Zaki
Rensselaer Polytechnic Institute
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 October 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
disambiguation
entity resolution
search engines
Qualifiers
- poster
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 152
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Entity resolution using search engine results

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Entity disambiguation with hierarchical topic models

Evaluating Entity Linking with Wikipedia

WEST: Modern Technologies for Web People Search

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Entity resolution using search engine results

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Entity disambiguation with hierarchical topic models

Evaluating Entity Linking with Wikipedia

WEST: Modern Technologies for Web People Search

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media