Skip to main content

Person Name Disambiguation in Web Pages Using Social Network, Compound Words and Latent Topics

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5012))

Included in the following conference series:

Abstract

The World Wide Web (WWW) provides much information about persons, and in recent years WWW search engines have been commonly used for learning about persons. However, many persons have the same name and that ambiguity typically causes the search results of one person name to include Web pages about several different persons. We propose a novel framework for person name disambiguation that has the following three components processes. Extraction of social network information by finding co-occurrences of named entities, Measurement of document similarities based on occurrences of key compound words, Inference of topic information from documents based on the Dirichlet process unigram mixture model. Experiments using an actual Web document dataset show that the result of our framework is promising.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Antoniak, C.E.: Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems. The Annals of Statistics 2(6) (1974)

    Google Scholar 

  2. Artiles, J., Gonzalo, J., Sekine, S.: The SemEval-2007 WePS Evaluation: Establishing a benchmark for the Web People Search Task. In: Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval-2007), pp. 64–69 (2007)

    Google Scholar 

  3. Attias, H.: Learning parameters and structure of latent variable models by Variational Bayes. In: Proceedings of Uncertainty in Artificial Intelligence (1999)

    Google Scholar 

  4. Bagga, A., Baldwin, B.: Entity-Based Cross-Document Coreferencing Using the Vector Space Model. In: Proceedings of COLING-ACL 1998, pp. 79–85 (1998)

    Google Scholar 

  5. Bekkerman, R., McCallum, A.: Disambiguating Web Appearances of People in a Social Network. In: Proceedings of WWW 2005, pp. 463–470 (2005)

    Google Scholar 

  6. Blei, D.M., Jordan, M.I.: Variational inference for Dirichlet process mixtures. Journal of Bayesian Analysis 1(1), 121–144 (2005)

    MathSciNet  Google Scholar 

  7. Ferguson, T.S.: A Bayesian Analysis of Some Nonparametric Problems. The Annals of Statistics 1(2) (1973)

    Google Scholar 

  8. Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: Proceedings of the 5th ACM SIGKDD, pp. 16–22 (1999)

    Google Scholar 

  9. Mann, G.S., Yarowsky, D.: Unsupervised Personal Name Disambiguation. In: Proceedings of CoNLL 2003, pp. 33–40 (2003)

    Google Scholar 

  10. Morton, T.S.: Coreference for NLP Applications. In: Proceedings of ACL-2000, pp. 173–180 (2000)

    Google Scholar 

  11. Nakagawa, H., Mori, T.: Automatic Term Recognition based on Statistics of Compound Nouns and their Components. Terminology 9(2), 201–219 (2003)

    Article  Google Scholar 

  12. Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.M.: Text Classification from Labeled and Unlabeled Documents using EM. Machine Learning 39, 103–134 (2000)

    Article  MATH  Google Scholar 

  13. Niu, C., Li, W., Srihari, R.K.: Weakly Supervised Learning for Cross-document Person Name Disambiguation Supported by Information Extraction. In: Proceedings of ACL-2004, pp. 598–605 (2004)

    Google Scholar 

  14. Ono, S., Yoshida, M., Nakagawa, H.: NAYOSE: A System for Reference Disambiguation of Proper Nouns Appearing on Web Pages. In: Ng, H.T., Leong, M.-K., Kan, M.-Y., Ji, D. (eds.) AIRS 2006. LNCS, vol. 4182, pp. 338–349. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  15. Sethuraman, J.: A Constructive Definition of Dirichlet Priors. Statistica Sinica 4, 639–650 (1994)

    MATH  MathSciNet  Google Scholar 

  16. Wan, X., Gao, J., Li, M., Ding, B.: Person Resolution in Person Search Results: WebHawk. In: Proceedings of CIKM 2005, pp. 163–170 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Takashi Washio Einoshin Suzuki Kai Ming Ting Akihiro Inokuchi

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ono, S., Sato, I., Yoshida, M., Nakagawa, H. (2008). Person Name Disambiguation in Web Pages Using Social Network, Compound Words and Latent Topics. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2008. Lecture Notes in Computer Science(), vol 5012. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68125-0_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-68125-0_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68124-3

  • Online ISBN: 978-3-540-68125-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics