Unsupervised author disambiguation using Dempster–Shafer theory

Wu, Hao; Li, Bo; Pei, Yijian; He, Jun

doi:10.1007/s11192-014-1283-x

Unsupervised author disambiguation using Dempster–Shafer theory

Published: 20 April 2014

Volume 101, pages 1955–1972, (2014)
Cite this article

Scientometrics Aims and scope Submit manuscript

Hao Wu¹,
Bo Li¹,
Yijian Pei¹ &
…
Jun He²

1044 Accesses
37 Citations
Explore all metrics

Abstract

The name ambiguity problem presents many challenges for scholar finding, citation analysis and other related research fields. To attack this issue, various disambiguation methods combined with separate disambiguation features have been put forward. In this paper, we offer an unsupervised Dempster–Shafer theory (DST) based hierarchical agglomerative clustering algorithm for author disambiguation tasks. Distinct from existing methods, we exploit the DST in combination with Shannon’s entropy to fuse various disambiguation features and come up with a more reliable candidate pair of clusters for amalgamation in each iteration of clustering. Also, some solutions to determine the convergence condition of the clustering process are proposed. Depending on experiments, our method outperforms three unsupervised models, and achieves comparable performances to a supervised model, while does not prescribe any hand-labelled training data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the combination of domain-specific heuristics for author name disambiguation: the nearest cluster method

Article 07 July 2015

Author Disambiguation

Author Name Disambiguation by Exploiting Graph Structural Clustering and Hybrid Similarity

Article 16 February 2018

Notes

References

Cota, R. G., Ferreira, A. A., Nascimento, C., Goncalves, M. A., & Laender, A. H. (2010). An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. Journal of the American Society for Information Science and Technology, 61(9), 1853–1870.
Article Google Scholar
Culotta, A., Kanani, P., Hall, R., Wick, M., & McCallum, A. (2007). Author disambiguation using error-driven machine learning with a ranking loss function. In Proceedings of the 6th international workshop on information integration on the web.
Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd international conference on knowledge discovery and data mining (pp. 226–231).
Fan, X., Wang, J., Pu, X., Zhou, L., & Lv, B. (2011). On graph-based name disambiguation. Journal of Data and Information Quality, 2(2), 10.
Article Google Scholar
Ferreira, A. A., Goncalves, M. A., & Laender, A. H. (2012). A brief survey of automatic methods for author name disambiguation. ACM SIGMOD Record, 41(2), 15–26.
Article Google Scholar
Ferreira, A. A., Machado, T. M., & Goncalves, M. A. (2012). Improving author name disambiguation with user relevance feedback. Journal of Information and Data Management, 3(3), 332–347.
Google Scholar
Ferreira, A. A., Veloso, A., Goncalves, M. A., & Laender, A. H. (2010). Effective self-training author name disambiguation in scholarly digital libraries. In Proceedings of the 10th ACM/IEEE-CS joint conference on digital libraries (pp. 39–48).
Gurney, T., Horlings, E., & Van Den Besselaar, P. (2012). Author disambiguation using multi-aspect similarity indicators. Scientometrics, 91(2), 435–449.
Article Google Scholar
Han, H., Giles, C. L., & Hong, Y. Z. (2004). Two supervised learning approaches for name disambiguation in author citations. In Proceedings of the 4th ACM/IEEE-CS joint conference on digital librarie (pp. 296–305).
Han, H., Zhang, H., & Giles, C. L. (2005). Name disambiguation in author citations using a k-way spectral clustering method. In Proceedings of the 5th ACM/IEEE-CS joint conference on digital libraries (pp. 334–343).
Huang, J., & Seyda Ertekin, C. L. G. (2006). Efficient name disambiguation for large scale databases. In Proceedings of the 10th European conference on principles and practice of knowledge discovery in databases (pp. 536–544).
Kalashnikov, D. V., & Mehrotra, S. (2006). Domain-independent data cleaning via analysis of entity relationship graph. ACM Transactions on Database System, 31(2), 716–767.
Article Google Scholar
Kang, I. S., Na, S. H., Lee, S., Jung, H., Kim, P., Sung, W. K., et al. (2009). On co-authorship for author disambiguation. Information Processing & Management, 45(1), 84–97.
Article Google Scholar
Lalmas, M., & Ruthven, I. (1998). Representing and retrieving structured documents using the Dempster–Shafer theory of evidence: Modelling and evaluation. Journal of Documentation, 54(5), 529–565.
Article Google Scholar
Lapidot, I. (2002). Self-organizing-maps with BIC for speaker clustering. Martigny, IDIAP Research Institute, Switzerland: Technical report.
Google Scholar
Levin, M., Krawczyk, S., Bethard, S., & Jurafsky, D. (2012). Citation-based bootstrapping for large-scale author disambiguation. Journal of the Association for Information Science and Technology, 63(5), 1030–1047.
Article Google Scholar
Lucas, P., & Van Der Gaag, L. (1991). Principles of expert systems. Chicago: Addison-Wesley Longman Publishing Co., Inc.
MATH Google Scholar
McRae-Spencer, D. M., & Shadbolt, N. R. (2006). Also by the same author: AKTiveAuthor, a citation graph approach to name disambiguation. In Proceedings of the 6th ACM/IEEE-CS joint conference on digital libraries (pp. 53–54).
Milojevic, S. (2013). Accuracy of simple, initials-based methods for author name disambiguation. Journal of Informetrics, 7(4), 767–773.
Article Google Scholar
Moreira, C., & Wichert, A. (2013). Finding academic experts on a multisensor approach using Shannon’s entropy. Expert Systems Applications, 40(14), 5740–5754.
Article Google Scholar
Pereira, D. A., Ribeiro, B. N., Ziviani, N., Alberto, H. F., Goncalves, A. M., & Ferreira, A. A. (2009). Using web information for author name disambiguation. In Proceedings of the 9th ACM/IEEE joint conference on digital libraries (pp. 49–58).
Rijsbergen, C. J. V. (1979). Information retrieval (2nd ed.). London: Butterworths.
Google Scholar
Ruthven, I., & Lalmas, M. (2002). Using Dempster–Shafer’s theory of evidence to combine aspects of information use. Journal of Intelligent Information Systems, 19(3), 267–301.
Article Google Scholar
Shafer, G. (1976). A mathematical theory of evidence (Vol. 1). Princeton: Princeton University Press.
MATH Google Scholar
Smalheiser, N. R., & Torvik, V. I. (2009). Author name disambiguation. Annual Review of Information Science and Technology, 43(1), 1–43.
Article Google Scholar
Song, Y., Huang, J., Councill, I. G., Li, J., & Giles, C. L. (2007). Efficient topic-based unsupervised name disambiguation. In Proceedings of the 7th ACM/IEEE joint conference on digital libraries (pp. 342–352).
Strotmann, A., & Zhao, D. (2012). Author name disambiguation: What difference does it make in author-based citation analysis? Journal of the Association for Information Science and Technology , 63(9), 1820–1833.
Article Google Scholar
Strotmann, A., Zhao, D., & Bubela, T. (2009). Author name disambiguation for collaboration network analysis and visualization. Journal of American Society for Information Science technology, 46, 1–20.
Google Scholar
Tan, Y. F., Kan, M. Y., & Lee, D. W. (2006). Search engine driven author disambiguation. In Proceedings of the 6th ACM/IEEE joint conference on digital libraries (pp. 314–315).
Tang, J., Fong, A. C. M., Wang, B., & Zhang, J. (2012). A unified probabilistic framework for name disambiguation in digital library. IEEE Transactions on Knowledge and Data Engineering , 24(6), 975–987.
Article Google Scholar
Torvik, V. I., Weeber, M., Swanson, D. R., & Smalheiser, N. R. (2005). A probabilistic similarity metric for Medline records: A model for author name disambiguation. Journal of the American Society for Information Science and Technology, 56 (2), 140–158.
Article Google Scholar
Treeratpituk, P., & Giles, C. L. (2009). Disambiguating authors in academic publications using random forests. In Proceedings of the 9th ACM/IEEE-CS joint conference on digital libraries (pp. 39–48).
Velden, T. A., Haque, A. U., & Lagoze, C. (2011). Resolving author name homonymy to improve resolution of structures in co-author networks. In Proceedings of the 11th ACM/IEEE-CS joint conference on digital libraries (pp. 241–250).
Wang, X., Tang, J., Cheng, H., & Yu, P. S. (2011). ADANA: Active name disambiguation. In Proceedings of the IEEE 11th international conference on data mining (pp. 794–803).
Wu, J., & Ding, X. (2013). Author name disambiguation in scientific collaboration and mobility cases. Scientometrics, 96(3), 683–697.
Article MathSciNet Google Scholar
Wu, H., Pei, Y. J., & Li, B. (2012). Scholar search-oriented author disambiguation. In Proceedings of the 9th international conference on fuzzy systems and knowledge discovery (pp. 1166–1170).
Wu, H., Pei, Y. J., & Yu, J. (2009). Detecting academic experts by topic-sensitive link analysis. Frontiers of Computer Science in China, 3(4), 445–456.
Article Google Scholar
Yang, K. H., Peng, H. T., Jiang, J. Y., Lee, H. M., & Ho, J. H. (2008). Author name disambiguation for citations using topic and web correlation. In Proceedings of the 12th European conference on research and advanced technology for digital libraries (pp. 185–196).
Yin, X., Han, J., & Yu, P. S. (2007). Object distinction: Distinguishing objects with identical names. In Proceedings of IEEE the 23rd international conference on data engineering (pp. 1242–1246).
Yu, Z., Tian, Y., & Xi, B. (2005). Dempster–Shafer evidence theory of information fusion based on info-evolutionary value for e-business with continuous improvement. In Proceedings of IEEE international conference on e-Business engineering (pp. 586–590).

Download references

Acknowledgments

This work is supported by the Scientific Research Project of Yunnan University (2010YB024) and the Applied Basic Research Project of Yunnan Province (2013FB009). This work of Jun He is supported by the National Natural Science Foundation of China (61203273). We are grateful to anonymous reviewers for their useful comments and suggestions which contribute to substantially improving this paper.

Author information

Authors and Affiliations

School of Information Science and Engineering, Yunnan University, Kunming, 650091, China
Hao Wu, Bo Li & Yijian Pei
Nanjing University of Information Science and Technology, Nanjing, 210044, China
Jun He

Authors

Hao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Bo Li
View author publications
You can also search for this author in PubMed Google Scholar
Yijian Pei
View author publications
You can also search for this author in PubMed Google Scholar
Jun He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hao Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, H., Li, B., Pei, Y. et al. Unsupervised author disambiguation using Dempster–Shafer theory. Scientometrics 101, 1955–1972 (2014). https://doi.org/10.1007/s11192-014-1283-x

Download citation

Received: 12 December 2013
Published: 20 April 2014
Issue Date: December 2014
DOI: https://doi.org/10.1007/s11192-014-1283-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unsupervised author disambiguation using Dempster–Shafer theory

Abstract

Access this article

Similar content being viewed by others

On the combination of domain-specific heuristics for author name disambiguation: the nearest cluster method

Author Disambiguation

Author Name Disambiguation by Exploiting Graph Structural Clustering and Hybrid Similarity

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Unsupervised author disambiguation using Dempster–Shafer theory

Abstract

Access this article

Similar content being viewed by others

On the combination of domain-specific heuristics for author name disambiguation: the nearest cluster method

Author Disambiguation

Author Name Disambiguation by Exploiting Graph Structural Clustering and Hybrid Similarity

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation