Skip to main content
Log in

A constraint-based topic modeling approach for name disambiguation

  • Research Article
  • Published:
Frontiers of Computer Science in China Aims and scope Submit manuscript

Abstract

Name ambiguity refers to a problem that different people might be referenced with an identical name. This problem has become critical in many applications, particularly in online bibliography systems, such as DBLP and CiterSeer. Although much work has been conducted to address this problem, there still exist many challenges. In this paper, a general framework of constraint-based topic modeling is proposed, which can make use of user-defined constraints to enhance the performance of name disambiguation. A Gibbs sampling algorithm that integrates the constraints has been proposed to do the inference of the topic model. Experimental results on a real-world dataset show that significant improvements can be obtained by taking the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Han H, Giles L, Zha H, et al. Two supervised learning approaches for name disambiguation in author citations. In: Proceedings of Joint Conference on Digital Libraries 2004. Tucson, Arizona, USA, June 2004, 296–305

  2. Han H, Zha H, Giles C L. Name disambiguation in author citations using a K-way spectral clustering method. In: Proceedings of Joint Conference on Digital Libraries 2005. Denver, Colorado, USA, June 2005, 334–343

  3. Han H, Xu W, Zha H Y, et al. A hierarchical naïve bayes mixture model for name disambiguation in author citations. In: Proceedings of the 20th Annual ACM Symposium on Applied Computing, 2005

  4. Tan Y F, Kan M, Lee D. Search engine driven author Disambiguation. In: Proceedings of Joint Conference on Digital Libraries 2006. Chapel Hill, NC, USA, June 2006, 314–315

  5. Yin X, Han J, Yu P S. Object distinction: distinguishing object with identical names. In: Proceedings of IEEE 23rd International Conference on Data Engineering, 2007, 1242–1246

  6. Bekkerman R, McCallum A. Disambiguating Web appearances of people in a social network. In: Proceedings of the International World Wide Web Conference 2005. ACM Press, 2005, 463–470

  7. Mann G, Yarowsky D. Unsupervised personal name disambiguation. In: Proceedings of 7th Conference on Computational Natural Language Learning. Edmonton, Canada, 2003, 33–40

  8. Minkov E, Cohen W W, Ng A Y. Contextual search and name disambiguation in email using graphs. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. USA, 2006, 27–34

  9. Malin B, Airoldi E, Carley K. A network analysis model for disambiguation of names in lists. Computational & Mathematical Organization Theory, 2005, 11(2): 119–139

    Article  MATH  Google Scholar 

  10. Malin B. Unsupervised name disambiguation via social network similarity. In: SIAM SDM Workshop on Link Analysis, Counterterrorism and Security, 2005

  11. Huang J, Ertekin S, Giles C. Efficient name disambiguation for large-scale databases. In: Proceeding of the 10th European Conference on Principles and Practice of Knowledge Discovery in Database, 2006

  12. Huang J, Ertekin S, Giles C. Fast Author Name Disambiguation In CiteSeer. IST Technical Report No. 0019, the Pennsylvania State University, 2006

  13. Lee D, On B, Kang J, et al. Effective and scalable solutions for mixed and split citation problems in digital libraries. In: Proceedings of MIT Information Quality Industry Symposium, 2005, 69–76

  14. Li X, Morie P, Roth D. Identification and tracing of ambiguous names: discriminative and generative approaches. In: Proceedings of the Conference of Association for the Advancement of Artificial Intelligence, 2004, 419–424

  15. Elmacioglu E, Tan Y, Yan S, et al. PSNUS: web people name disambiguation by simple clustering with rich features. In: Proceedings of the 4th International Workshop on Semantic Evaluation (SemEval), 2007

  16. Chen Z, Kalashnikov DV, Mehrotra S. Adaptive graphical approach to entity resolution. In: Proceedings of Joint Cenference of Digital Libraries, 2007, 204–213

  17. On B, Lee D. Scalable name disambiguation using multi-level graph partition. In: Society of Industrial and Applied Mathematics International Conference on Data Mining (SDM), 2007

  18. Zhang D, Tang J, Li J, et al. Constraint-based probabilistic framework for name disambiguation. In: Proceedings of ACM Conference on Information and Knowledge Management, 2007, 1019–1022

  19. Hofmann T. Probabilistic latent semantic indexing. In: Proceedings of Special Interest Group on Information Retrieval, 1999, 50–57

  20. Blei D M, Ng A Y, Jordan M I, et al. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003, (3): 993–1022

  21. Steyvers M, Smyth P, Griffiths T, et al. Probabilistic author-topic models for information discovery. In: Proceedings of the 10th ACM International Conference on Knowledge Discovery and Data Mining, 2004, 306–315

  22. Song Y, Huang J, Councill I G, et al. Generative models for name disambiguation. In: Proceedings of the World Wide Web Conference, 2007

  23. Griffiths T, Steyvers M. Finding scientific topics, In: Proceedings of the National Academy of Sciences, 2004, 101(suppl 1): 5228–5235

    Article  Google Scholar 

  24. Heinrich G. Parameter estimation for text analysis. Technical Report, 2008

  25. Mei Q, Cai D, Zhang D, et al. Topic modeling with network regularization. In: Proceedings of the World Wide Web Conference, 2008

  26. Cohn D, Caruana R, McCallum A. Semi-supervised Clustering with User Feedback. Technical Report TR2003-1892, Cornell University, 2003

  27. On B, Lee D, Kang J, et al. Comparative study of name disambiguation problem using a scalable blocking-based framework. In: Proceedings of Joint Conference on Digital Libraries, 2005, 344–353

  28. Tang J, Zhang J, Yao L, et al. ArnetMiner: extraction and mining of academic social networks. In: Proceedings of Knowledge Discovery and Data Mining Conference, 2008, 990–998

  29. Cai D, He X, Han J. Spectral Regression for Dimensionality Reduction. Department of Computer Science Technical Report No. 2856, University of Illinois at Urbana-Champaign. 2007

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jie Tang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, F., Tang, J., Li, J. et al. A constraint-based topic modeling approach for name disambiguation. Front. Comput. Sci. China 4, 100–111 (2010). https://doi.org/10.1007/s11704-009-0064-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-009-0064-9

Keywords

Navigation