Resolving Person Names in Web People Search

Balog, Krisztian; Azzopardi, Leif; de Rijke, Maarten

doi:10.1007/978-3-642-00570-1_15

Krisztian Balog³,
Leif Azzopardi⁴ &
Maarten de Rijke⁵

370 Accesses
7 Citations

Abstract

Disambiguating person names in a set of documents (such as a set of web pages returned in response to a person name) is a key task for the presentation of results and the automatic profiling of experts. With largely unstructured documents and an unknown number of people with the same name the problem presents many difficulties and challenges. This chapter treats the task of person name disambiguation as a document clustering problem, where it is assumed that the documents represent particular people. This leads to the person cluster hypothesis, which states that similar documents tend to represent the same person. Single Pass Clustering, k-Means Clustering, Agglomerative Clustering and Probabilistic Latent Semantic Analysis are employed and empirically evaluated in this context. On the SemEval 2007 Web People Search it is shown that the person cluster hypothesis holds reasonably well and that the Single Pass Clustering and Agglomerative Clustering methods provide the best performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.00; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Name identification and extraction with formal concept analysis

Article 18 March 2016

Personal Name Disambiguation for Chinese Documents in Online Medium

MC4WEPS: a multilingual corpus for Web people search disambiguation

Article 08 August 2016

References

R. Al-Kamha and D. W. Embley. Grouping search-engine returned citations for person-name queries. InWIDM ’04: Proceedings of the 6th Annual ACM International Workshop on Web Information and Data Management, pages 96–103, New York, NY, USA, 2004. ACM Press.
Google Scholar
J. Artiles, J. Gonzalo, and S. Sekine. The SemEval-2007 WePS evaluation: establishing a benchmark for the Web people search task. InProceedings of Semeval 2007, Association for Computational Linguistics, 2007.
Google Scholar
J. Artiles, J. Gonzalo, and F. Verdejo. A testbed for people searching strategies in the www. In SIGIR ’05: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 569–570, New York, NY, USA, 2005. ACM Press.
A. Bagga and B. Baldwin. Entity-based cross-document coreferencing using the vector space model. InProceedings of the 36th Annual Meeting of the Association for Computational Linguistics (ACL) and 17th Conference on Computational Linguistics (COLING), pages 79–85, 1998.
Google Scholar
K. Balog.People Search in the Enterprise. PhD thesis, University of Amsterdam, June 2008.
Google Scholar
K. Balog, L. Azzopardi, and M. de Rijke. Personal name resolution of web people search. InWWW2008 Workshop: NLP Challenges in the Information Explosion Era (NLPIX 2008), April 2008.
Google Scholar
K. Balog and M. de Rijke. Associating people and documents. InProceedings of the 30th European Conference on Information Retrieval (ECIR 2008), pages 296–308, 2008.
Google Scholar
R. Bekkerman and A. McCallum. Disambiguating web appearances of people in a social network. InProceedings of the 14th International World Wide Web (WWW) Conference, pages 463–470, 2005.
Google Scholar
D. Bollegala, Y. Matsuo, and M. Ishizuka. Extracting key phrases to disambiguate personal name queries in web search. InProceedings of the Workshop on How Can Computational Linguistics Improve Information Retrieval? At ACL’06, pages 17–24, 2006.
Google Scholar
N. Craswell, A. de Vries, and I. Soboroff. Overview of the TREC-2005 enterprise track.The Fourteenth Text REtrieval Conference (TREC 2005) Proceedings, 2006.
Google Scholar
M. Fleischman and E. Hovy. Multi-document person name resolution. InProceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL), Reference Resolution Workshop, 2004.
Google Scholar
C. Gooi and J. Allan. Cross-document coreference on a large scale corpus. InProceedings of the Human Language Technology/North American Chapter of Association for Computational Linguistics Annual Meeting (HLT/NAACL),, 2004.
Google Scholar
J. A. Hartigan and M. A. Wong. A k-means clustering algorithm.Applied Statistics, 28: 100–108, 1979.
Article MATH Google Scholar
D. R. Hill. A vector clustering technique. In Samuelson, editor,Mechanised Information Storage, Retrieval and Dissemination, North-Holland, Amsterdam, 1968.
Google Scholar
T. Hofmann. Probabilistic latent semantic analysis. InProceedings of Uncertainty in Artificial Intelligence, UAI’99, Stockholm, 1999. URLciteseer.ist.psu.edu/hofmann99probabilistic.html.
N. Jardine and C. J. van Rijsbergen. The use of hierarchic clustering in information retrieval.Information Storage and Retrieval, 7: 217–240, 1971.
Article Google Scholar
T. Kalt. A new probabilistic model of text classification and retrieval. Technical Report CIIR TR98-18, University of Massachusetts, January 1996.
Google Scholar
V. Lavrenko and W. B. Croft. Relevance-based language models. InProceedings of the 24th Annual International ACM SIGIR Conference, pages 120–127, New Orleans, LA, 2001. ACM Press.
Google Scholar
B. Malin. Unsupervised name disambiguation via social network similarity. InProceedings of the SIAM Workshop on Link Analysis, Counterterrorism, and Security, pages 93–102, 2005.
Google Scholar
G. Mann and D. Yarowsky. Unsupervised personal name disambiguation. InConference on Computational Natural Language Learning (CoNLL), 2003.
Google Scholar
G. A. Miller and W. G. Charles. Contextual correlates of semantic similarity.Language and Cognitive Processes, 6: 1–28, 1991.
Article Google Scholar
T. Pedersen, A. Purandare, and A. Kulkarni. Name discrimination by clustering similar contexts. InComputational Linguistics and Intelligent Text Processing, pages 226–237. Springer Berlin - Heidelberg, 2005.
Google Scholar
X. Phan, L. Nguyen, and S. Horiguchi. Personal name resolution crossover documents by a semantics-based approach.IEICE Transactions on Information and Systems, E89-D (2): 825–836, 2006.
Article Google Scholar
A. I. Schein, A. Popescul, L. H. Ungar, and D. M. Pennock. Methods and metrics for cold-start recommendations. InSIGIR ’02: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 253–260, New York, NY, USA, 2002. ACM Press. Seehttp://www.cis.upenn.edu/datamining/software_dist/PennAspect/.
M. Steinbach, G. Karypis, and V. Kumar. A comparison of document clustering techniques.In Proceedings of Workshop on Text Mining, 6th ACM SIGKDD International Conference on Data Mining (KDD’00), pages 109–110, 2000.
Google Scholar
M. Taffet. Looking ahead to person resolution. InProceedings of the 4th Annual Workshop on Technology for Family History and Genealogical Research, pages 11–15, 2004.
Google Scholar
X. Wan, J. Gao, M. Li, and B. Ding. Person resolution in person search results: Webhawk. InCIKM ’05: Proceedings of the 14th ACM international conference on Information and knowledge management, pages 163–170, New York, NY, USA, 2005. ACM Press.
Google Scholar

Download references

Author information

Authors and Affiliations

ISLA, University of Amsterdam, Amsterdam, The Netherlands
Krisztian Balog
DCS, University of Glasgow, Glasgow, UK
Leif Azzopardi
ISLA, University of Amsterdam, Amsterdam, The Netherlands
Maarten de Rijke

Authors

Krisztian Balog
View author publications
You can also search for this author in PubMed Google Scholar
Leif Azzopardi
View author publications
You can also search for this author in PubMed Google Scholar
Maarten de Rijke
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Krisztian Balog .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Balog, K., Azzopardi, L., de Rijke, M. (2009). Resolving Person Names in Web People Search. In: King, I., Baeza-Yates, R. (eds) Weaving Services and People on the World Wide Web. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00570-1_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-00570-1_15
Published: 17 April 2009
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00569-5
Online ISBN: 978-3-642-00570-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics