Abstract
The name ambiguity problem presents many challenges for scholar finding, citation analysis and other related research fields. To attack this issue, various disambiguation methods combined with separate disambiguation features have been put forward. In this paper, we offer an unsupervised Dempster–Shafer theory (DST) based hierarchical agglomerative clustering algorithm for author disambiguation tasks. Distinct from existing methods, we exploit the DST in combination with Shannon’s entropy to fuse various disambiguation features and come up with a more reliable candidate pair of clusters for amalgamation in each iteration of clustering. Also, some solutions to determine the convergence condition of the clustering process are proposed. Depending on experiments, our method outperforms three unsupervised models, and achieves comparable performances to a supervised model, while does not prescribe any hand-labelled training data.
Similar content being viewed by others
References
Cota, R. G., Ferreira, A. A., Nascimento, C., Goncalves, M. A., & Laender, A. H. (2010). An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. Journal of the American Society for Information Science and Technology, 61(9), 1853–1870.
Culotta, A., Kanani, P., Hall, R., Wick, M., & McCallum, A. (2007). Author disambiguation using error-driven machine learning with a ranking loss function. In Proceedings of the 6th international workshop on information integration on the web.
Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd international conference on knowledge discovery and data mining (pp. 226–231).
Fan, X., Wang, J., Pu, X., Zhou, L., & Lv, B. (2011). On graph-based name disambiguation. Journal of Data and Information Quality, 2(2), 10.
Ferreira, A. A., Goncalves, M. A., & Laender, A. H. (2012). A brief survey of automatic methods for author name disambiguation. ACM SIGMOD Record, 41(2), 15–26.
Ferreira, A. A., Machado, T. M., & Goncalves, M. A. (2012). Improving author name disambiguation with user relevance feedback. Journal of Information and Data Management, 3(3), 332–347.
Ferreira, A. A., Veloso, A., Goncalves, M. A., & Laender, A. H. (2010). Effective self-training author name disambiguation in scholarly digital libraries. In Proceedings of the 10th ACM/IEEE-CS joint conference on digital libraries (pp. 39–48).
Gurney, T., Horlings, E., & Van Den Besselaar, P. (2012). Author disambiguation using multi-aspect similarity indicators. Scientometrics, 91(2), 435–449.
Han, H., Giles, C. L., & Hong, Y. Z. (2004). Two supervised learning approaches for name disambiguation in author citations. In Proceedings of the 4th ACM/IEEE-CS joint conference on digital librarie (pp. 296–305).
Han, H., Zhang, H., & Giles, C. L. (2005). Name disambiguation in author citations using a k-way spectral clustering method. In Proceedings of the 5th ACM/IEEE-CS joint conference on digital libraries (pp. 334–343).
Huang, J., & Seyda Ertekin, C. L. G. (2006). Efficient name disambiguation for large scale databases. In Proceedings of the 10th European conference on principles and practice of knowledge discovery in databases (pp. 536–544).
Kalashnikov, D. V., & Mehrotra, S. (2006). Domain-independent data cleaning via analysis of entity relationship graph. ACM Transactions on Database System, 31(2), 716–767.
Kang, I. S., Na, S. H., Lee, S., Jung, H., Kim, P., Sung, W. K., et al. (2009). On co-authorship for author disambiguation. Information Processing & Management, 45(1), 84–97.
Lalmas, M., & Ruthven, I. (1998). Representing and retrieving structured documents using the Dempster–Shafer theory of evidence: Modelling and evaluation. Journal of Documentation, 54(5), 529–565.
Lapidot, I. (2002). Self-organizing-maps with BIC for speaker clustering. Martigny, IDIAP Research Institute, Switzerland: Technical report.
Levin, M., Krawczyk, S., Bethard, S., & Jurafsky, D. (2012). Citation-based bootstrapping for large-scale author disambiguation. Journal of the Association for Information Science and Technology, 63(5), 1030–1047.
Lucas, P., & Van Der Gaag, L. (1991). Principles of expert systems. Chicago: Addison-Wesley Longman Publishing Co., Inc.
McRae-Spencer, D. M., & Shadbolt, N. R. (2006). Also by the same author: AKTiveAuthor, a citation graph approach to name disambiguation. In Proceedings of the 6th ACM/IEEE-CS joint conference on digital libraries (pp. 53–54).
Milojevic, S. (2013). Accuracy of simple, initials-based methods for author name disambiguation. Journal of Informetrics, 7(4), 767–773.
Moreira, C., & Wichert, A. (2013). Finding academic experts on a multisensor approach using Shannon’s entropy. Expert Systems Applications, 40(14), 5740–5754.
Pereira, D. A., Ribeiro, B. N., Ziviani, N., Alberto, H. F., Goncalves, A. M., & Ferreira, A. A. (2009). Using web information for author name disambiguation. In Proceedings of the 9th ACM/IEEE joint conference on digital libraries (pp. 49–58).
Rijsbergen, C. J. V. (1979). Information retrieval (2nd ed.). London: Butterworths.
Ruthven, I., & Lalmas, M. (2002). Using Dempster–Shafer’s theory of evidence to combine aspects of information use. Journal of Intelligent Information Systems, 19(3), 267–301.
Shafer, G. (1976). A mathematical theory of evidence (Vol. 1). Princeton: Princeton University Press.
Smalheiser, N. R., & Torvik, V. I. (2009). Author name disambiguation. Annual Review of Information Science and Technology, 43(1), 1–43.
Song, Y., Huang, J., Councill, I. G., Li, J., & Giles, C. L. (2007). Efficient topic-based unsupervised name disambiguation. In Proceedings of the 7th ACM/IEEE joint conference on digital libraries (pp. 342–352).
Strotmann, A., & Zhao, D. (2012). Author name disambiguation: What difference does it make in author-based citation analysis? Journal of the Association for Information Science and Technology , 63(9), 1820–1833.
Strotmann, A., Zhao, D., & Bubela, T. (2009). Author name disambiguation for collaboration network analysis and visualization. Journal of American Society for Information Science technology, 46, 1–20.
Tan, Y. F., Kan, M. Y., & Lee, D. W. (2006). Search engine driven author disambiguation. In Proceedings of the 6th ACM/IEEE joint conference on digital libraries (pp. 314–315).
Tang, J., Fong, A. C. M., Wang, B., & Zhang, J. (2012). A unified probabilistic framework for name disambiguation in digital library. IEEE Transactions on Knowledge and Data Engineering , 24(6), 975–987.
Torvik, V. I., Weeber, M., Swanson, D. R., & Smalheiser, N. R. (2005). A probabilistic similarity metric for Medline records: A model for author name disambiguation. Journal of the American Society for Information Science and Technology, 56 (2), 140–158.
Treeratpituk, P., & Giles, C. L. (2009). Disambiguating authors in academic publications using random forests. In Proceedings of the 9th ACM/IEEE-CS joint conference on digital libraries (pp. 39–48).
Velden, T. A., Haque, A. U., & Lagoze, C. (2011). Resolving author name homonymy to improve resolution of structures in co-author networks. In Proceedings of the 11th ACM/IEEE-CS joint conference on digital libraries (pp. 241–250).
Wang, X., Tang, J., Cheng, H., & Yu, P. S. (2011). ADANA: Active name disambiguation. In Proceedings of the IEEE 11th international conference on data mining (pp. 794–803).
Wu, J., & Ding, X. (2013). Author name disambiguation in scientific collaboration and mobility cases. Scientometrics, 96(3), 683–697.
Wu, H., Pei, Y. J., & Li, B. (2012). Scholar search-oriented author disambiguation. In Proceedings of the 9th international conference on fuzzy systems and knowledge discovery (pp. 1166–1170).
Wu, H., Pei, Y. J., & Yu, J. (2009). Detecting academic experts by topic-sensitive link analysis. Frontiers of Computer Science in China, 3(4), 445–456.
Yang, K. H., Peng, H. T., Jiang, J. Y., Lee, H. M., & Ho, J. H. (2008). Author name disambiguation for citations using topic and web correlation. In Proceedings of the 12th European conference on research and advanced technology for digital libraries (pp. 185–196).
Yin, X., Han, J., & Yu, P. S. (2007). Object distinction: Distinguishing objects with identical names. In Proceedings of IEEE the 23rd international conference on data engineering (pp. 1242–1246).
Yu, Z., Tian, Y., & Xi, B. (2005). Dempster–Shafer evidence theory of information fusion based on info-evolutionary value for e-business with continuous improvement. In Proceedings of IEEE international conference on e-Business engineering (pp. 586–590).
Acknowledgments
This work is supported by the Scientific Research Project of Yunnan University (2010YB024) and the Applied Basic Research Project of Yunnan Province (2013FB009). This work of Jun He is supported by the National Natural Science Foundation of China (61203273). We are grateful to anonymous reviewers for their useful comments and suggestions which contribute to substantially improving this paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wu, H., Li, B., Pei, Y. et al. Unsupervised author disambiguation using Dempster–Shafer theory. Scientometrics 101, 1955–1972 (2014). https://doi.org/10.1007/s11192-014-1283-x
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-014-1283-x