Skip to main content

Exploiting Background Knowledge for Clustering Person Names

  • Conference paper
Evaluation of Natural Language and Speech Tools for Italian (EVALITA 2012)

Abstract

Nowadays, surfing the Web and looking for persons seems to be one of the most common activities of Internet users. However, person names could be highly ambiguous and consequently search results are often a collection of documents about different people sharing the same name. In this paper a cross-document coreference system able to identify person names in different documents which refer to the same person entity is presented. The system exploits background knowledge through two mechanisms: (1) the use of a dynamic similarity threshold for clustering person names, which depends on the ambiguity of the name estimated using a phonebook; and (2) the disambiguation of names against a knowledge base containing person descriptions, using an entity linking system and including its output as an additional feature for computing similarity. The paper describes the system and reports its performance tested taking part in the News People Search (NePS) task at Evalita 2011. A version of the system is being used in a real-word application, which requires to corefer millions of names from multimedia sources.

This research was supported by the the LiveMemories project funded by the Provincia Autonoma of Trento – http://www.livememories.org/

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Artiles, J., Gonzalo, J., Sekine, S.: WePS 2 Evaluation Campaign: overview of the Web People Search Clustering Task. In: 18th WWW Conference (2009)

    Google Scholar 

  2. Bagga, A., Baldwin, B.: Entity-based cross-document coreferencing using the Vector Space Model. In: 36th Annual Meeting of the Association for Computational Linguistics and 17th Int. Conf. on Computational Linguistics (1998)

    Google Scholar 

  3. Bentivogli, L., Girardi, C., Pianta, E.: Creating a gold standard for person cross-document coreference resolution in Italian news. In: LREC Workshop on Resources and Evaluation for Identity Matching, Entity Resolution and Management (2008)

    Google Scholar 

  4. Bentivogli, L., Marchetti, A., Pianta, E.: The News People Search Task at EVALITA 2011: Evaluating Cross-Document Coreference Resolution of Named Person Entities in Italian News. In: Magnini, B., Cutugno, F., Falcone, M., Pianta, E. (eds.) EVALITA 2012. LNCS(LNAI), vol. 7689, pp. 126–134. Springer, Heidelberg (2012)

    Google Scholar 

  5. Heyer, L.J., Kruglyak, S., Yooseph, S.: Exploring expression data: identification and analysis of coexpressed genes. Genome Research 9, 1106–1115 (1999)

    Article  Google Scholar 

  6. Ji, H., Grishman, R.: Knowledge base population: successful approaches and challenges. In: 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (2011)

    Google Scholar 

  7. Pianta, E., Girardi, C., Zanoli, R.: The TextPro tool suite. In: LREC (2008)

    Google Scholar 

  8. Spink, A., Jansen, B., Pedersen, J.: Searching for people on Web search engines. Journal of Documentation 60, 266–278 (2004)

    Article  Google Scholar 

  9. Tamilin, A., Magnini, B., Serafini, L.: Leveraging entity linking by contextualized background knowledge: a case study for news domain in Italian. In: 6th Workshop on Semantic Web Applications and Perspectives (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zanoli, R., Corcoglioniti, F., Girardi, C. (2013). Exploiting Background Knowledge for Clustering Person Names. In: Magnini, B., Cutugno, F., Falcone, M., Pianta, E. (eds) Evaluation of Natural Language and Speech Tools for Italian. EVALITA 2012. Lecture Notes in Computer Science(), vol 7689. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35828-9_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35828-9_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35827-2

  • Online ISBN: 978-3-642-35828-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics