Abstract
Nowadays, surfing the Web and looking for persons seems to be one of the most common activities of Internet users. However, person names could be highly ambiguous and consequently search results are often a collection of documents about different people sharing the same name. In this paper a cross-document coreference system able to identify person names in different documents which refer to the same person entity is presented. The system exploits background knowledge through two mechanisms: (1) the use of a dynamic similarity threshold for clustering person names, which depends on the ambiguity of the name estimated using a phonebook; and (2) the disambiguation of names against a knowledge base containing person descriptions, using an entity linking system and including its output as an additional feature for computing similarity. The paper describes the system and reports its performance tested taking part in the News People Search (NePS) task at Evalita 2011. A version of the system is being used in a real-word application, which requires to corefer millions of names from multimedia sources.
This research was supported by the the LiveMemories project funded by the Provincia Autonoma of Trento – http://www.livememories.org/
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Artiles, J., Gonzalo, J., Sekine, S.: WePS 2 Evaluation Campaign: overview of the Web People Search Clustering Task. In: 18th WWW Conference (2009)
Bagga, A., Baldwin, B.: Entity-based cross-document coreferencing using the Vector Space Model. In: 36th Annual Meeting of the Association for Computational Linguistics and 17th Int. Conf. on Computational Linguistics (1998)
Bentivogli, L., Girardi, C., Pianta, E.: Creating a gold standard for person cross-document coreference resolution in Italian news. In: LREC Workshop on Resources and Evaluation for Identity Matching, Entity Resolution and Management (2008)
Bentivogli, L., Marchetti, A., Pianta, E.: The News People Search Task at EVALITA 2011: Evaluating Cross-Document Coreference Resolution of Named Person Entities in Italian News. In: Magnini, B., Cutugno, F., Falcone, M., Pianta, E. (eds.) EVALITA 2012. LNCS(LNAI), vol. 7689, pp. 126–134. Springer, Heidelberg (2012)
Heyer, L.J., Kruglyak, S., Yooseph, S.: Exploring expression data: identification and analysis of coexpressed genes. Genome Research 9, 1106–1115 (1999)
Ji, H., Grishman, R.: Knowledge base population: successful approaches and challenges. In: 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (2011)
Pianta, E., Girardi, C., Zanoli, R.: The TextPro tool suite. In: LREC (2008)
Spink, A., Jansen, B., Pedersen, J.: Searching for people on Web search engines. Journal of Documentation 60, 266–278 (2004)
Tamilin, A., Magnini, B., Serafini, L.: Leveraging entity linking by contextualized background knowledge: a case study for news domain in Italian. In: 6th Workshop on Semantic Web Applications and Perspectives (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zanoli, R., Corcoglioniti, F., Girardi, C. (2013). Exploiting Background Knowledge for Clustering Person Names. In: Magnini, B., Cutugno, F., Falcone, M., Pianta, E. (eds) Evaluation of Natural Language and Speech Tools for Italian. EVALITA 2012. Lecture Notes in Computer Science(), vol 7689. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35828-9_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-35828-9_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35827-2
Online ISBN: 978-3-642-35828-9
eBook Packages: Computer ScienceComputer Science (R0)