ABSTRACT
Name ambiguity problem has been a challenging issue for a long history. In this paper, we intend to make a thorough investigation of the whole problem. Specifically, we formalize the name disambiguation problem in a unified framework. The framework can incorporate both attribute and relationship into a probabilistic model. We explore a dynamic approach for automatically estimating the person number K and employ an adaptive distance measure to estimate the distance between objects. Experimental results show that our proposed framework can significantly outperform the baseline method.
- Basu, M. Bilenko, and R. J. Mooney. A Probabilistic Framework for Semi-Supervised Clustering. In Proc. of SIGKDD'2004, pp. 59--68, Seattle, USA, August 2004. Google ScholarDigital Library
- Ester, R. Ge, B.J. Gao, Z. Hu, and B. Ben-Moshe. Joint Cluster Analysis of Attribute Data and Relationship Data: the Connected K-center Problem. In Proc. of SDM'2006.Google Scholar
- Hammersley and P. Clifford. Markov Fields on Finite Graphs and Lattices. Unpublished manuscript. 1971.Google Scholar
- Tang, D. Zhang, and L. Yao. Social network extraction of academic researchers. Proc. of ICDM'2007. pp. 292--301 Google ScholarDigital Library
- Zhang, J. Tang, J. Li, and K. Wang. A constraint-based probabilistic framework for name disambiguation. Proc. of CIKM'2007. pp. 1019--1022 Google ScholarDigital Library
Index Terms
- A unified framework for name disambiguation
Recommendations
A novel multiple layers name disambiguation framework for digital libraries using dynamic clustering
In many types of databases, such as a science bibliography database, the name attribute is the most commonly used identifier to recognize entities. However, names are frequently ambiguous and not always unique, thereby causing problems in various ...
Web personal name disambiguation based on reference entity tables mined from the web
WIDM '09: Proceedings of the eleventh international workshop on Web information and data managementAmbiguous personal names are common on the Web, which pose a challenge for many different tasks. The traditional disambiguation employs the clustering methods. However, without reference entity tables, the clustering method can only identify whether two ...
A constraint-based probabilistic framework for name disambiguation
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge managementThis paper is concerned with the problem of name disambiguation. By name disambiguation, we mean distinguishing persons with the same name. It is a critical problem in many knowledge management applications. Despite much research work has been conducted,...
Comments