Abstract
Name ambiguity refers to a problem that different people might be referenced with an identical name. This problem has become critical in many applications, particularly in online bibliography systems, such as DBLP and CiterSeer. Although much work has been conducted to address this problem, there still exist many challenges. In this paper, a general framework of constraint-based topic modeling is proposed, which can make use of user-defined constraints to enhance the performance of name disambiguation. A Gibbs sampling algorithm that integrates the constraints has been proposed to do the inference of the topic model. Experimental results on a real-world dataset show that significant improvements can be obtained by taking the proposed approach.
Similar content being viewed by others
References
Han H, Giles L, Zha H, et al. Two supervised learning approaches for name disambiguation in author citations. In: Proceedings of Joint Conference on Digital Libraries 2004. Tucson, Arizona, USA, June 2004, 296–305
Han H, Zha H, Giles C L. Name disambiguation in author citations using a K-way spectral clustering method. In: Proceedings of Joint Conference on Digital Libraries 2005. Denver, Colorado, USA, June 2005, 334–343
Han H, Xu W, Zha H Y, et al. A hierarchical naïve bayes mixture model for name disambiguation in author citations. In: Proceedings of the 20th Annual ACM Symposium on Applied Computing, 2005
Tan Y F, Kan M, Lee D. Search engine driven author Disambiguation. In: Proceedings of Joint Conference on Digital Libraries 2006. Chapel Hill, NC, USA, June 2006, 314–315
Yin X, Han J, Yu P S. Object distinction: distinguishing object with identical names. In: Proceedings of IEEE 23rd International Conference on Data Engineering, 2007, 1242–1246
Bekkerman R, McCallum A. Disambiguating Web appearances of people in a social network. In: Proceedings of the International World Wide Web Conference 2005. ACM Press, 2005, 463–470
Mann G, Yarowsky D. Unsupervised personal name disambiguation. In: Proceedings of 7th Conference on Computational Natural Language Learning. Edmonton, Canada, 2003, 33–40
Minkov E, Cohen W W, Ng A Y. Contextual search and name disambiguation in email using graphs. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. USA, 2006, 27–34
Malin B, Airoldi E, Carley K. A network analysis model for disambiguation of names in lists. Computational & Mathematical Organization Theory, 2005, 11(2): 119–139
Malin B. Unsupervised name disambiguation via social network similarity. In: SIAM SDM Workshop on Link Analysis, Counterterrorism and Security, 2005
Huang J, Ertekin S, Giles C. Efficient name disambiguation for large-scale databases. In: Proceeding of the 10th European Conference on Principles and Practice of Knowledge Discovery in Database, 2006
Huang J, Ertekin S, Giles C. Fast Author Name Disambiguation In CiteSeer. IST Technical Report No. 0019, the Pennsylvania State University, 2006
Lee D, On B, Kang J, et al. Effective and scalable solutions for mixed and split citation problems in digital libraries. In: Proceedings of MIT Information Quality Industry Symposium, 2005, 69–76
Li X, Morie P, Roth D. Identification and tracing of ambiguous names: discriminative and generative approaches. In: Proceedings of the Conference of Association for the Advancement of Artificial Intelligence, 2004, 419–424
Elmacioglu E, Tan Y, Yan S, et al. PSNUS: web people name disambiguation by simple clustering with rich features. In: Proceedings of the 4th International Workshop on Semantic Evaluation (SemEval), 2007
Chen Z, Kalashnikov DV, Mehrotra S. Adaptive graphical approach to entity resolution. In: Proceedings of Joint Cenference of Digital Libraries, 2007, 204–213
On B, Lee D. Scalable name disambiguation using multi-level graph partition. In: Society of Industrial and Applied Mathematics International Conference on Data Mining (SDM), 2007
Zhang D, Tang J, Li J, et al. Constraint-based probabilistic framework for name disambiguation. In: Proceedings of ACM Conference on Information and Knowledge Management, 2007, 1019–1022
Hofmann T. Probabilistic latent semantic indexing. In: Proceedings of Special Interest Group on Information Retrieval, 1999, 50–57
Blei D M, Ng A Y, Jordan M I, et al. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003, (3): 993–1022
Steyvers M, Smyth P, Griffiths T, et al. Probabilistic author-topic models for information discovery. In: Proceedings of the 10th ACM International Conference on Knowledge Discovery and Data Mining, 2004, 306–315
Song Y, Huang J, Councill I G, et al. Generative models for name disambiguation. In: Proceedings of the World Wide Web Conference, 2007
Griffiths T, Steyvers M. Finding scientific topics, In: Proceedings of the National Academy of Sciences, 2004, 101(suppl 1): 5228–5235
Heinrich G. Parameter estimation for text analysis. Technical Report, 2008
Mei Q, Cai D, Zhang D, et al. Topic modeling with network regularization. In: Proceedings of the World Wide Web Conference, 2008
Cohn D, Caruana R, McCallum A. Semi-supervised Clustering with User Feedback. Technical Report TR2003-1892, Cornell University, 2003
On B, Lee D, Kang J, et al. Comparative study of name disambiguation problem using a scalable blocking-based framework. In: Proceedings of Joint Conference on Digital Libraries, 2005, 344–353
Tang J, Zhang J, Yao L, et al. ArnetMiner: extraction and mining of academic social networks. In: Proceedings of Knowledge Discovery and Data Mining Conference, 2008, 990–998
Cai D, He X, Han J. Spectral Regression for Dimensionality Reduction. Department of Computer Science Technical Report No. 2856, University of Illinois at Urbana-Champaign. 2007
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, F., Tang, J., Li, J. et al. A constraint-based topic modeling approach for name disambiguation. Front. Comput. Sci. China 4, 100–111 (2010). https://doi.org/10.1007/s11704-009-0064-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11704-009-0064-9