Skip to main content
Log in

Unsupervised author disambiguation using Dempster–Shafer theory

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

The name ambiguity problem presents many challenges for scholar finding, citation analysis and other related research fields. To attack this issue, various disambiguation methods combined with separate disambiguation features have been put forward. In this paper, we offer an unsupervised Dempster–Shafer theory (DST) based hierarchical agglomerative clustering algorithm for author disambiguation tasks. Distinct from existing methods, we exploit the DST in combination with Shannon’s entropy to fuse various disambiguation features and come up with a more reliable candidate pair of clusters for amalgamation in each iteration of clustering. Also, some solutions to determine the convergence condition of the clustering process are proposed. Depending on experiments, our method outperforms three unsupervised models, and achieves comparable performances to a supervised model, while does not prescribe any hand-labelled training data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. http://www.informatik.uni-trier.de/ley/db/

  2. http://www.arnetminer.org/disambiguation

References

  • Cota, R. G., Ferreira, A. A., Nascimento, C., Goncalves, M. A., & Laender, A. H. (2010). An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. Journal of the American Society for Information Science and Technology, 61(9), 1853–1870.

    Article  Google Scholar 

  • Culotta, A., Kanani, P., Hall, R., Wick, M., & McCallum, A. (2007). Author disambiguation using error-driven machine learning with a ranking loss function. In Proceedings of the 6th international workshop on information integration on the web.

  • Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd international conference on knowledge discovery and data mining (pp. 226–231).

  • Fan, X., Wang, J., Pu, X., Zhou, L., & Lv, B. (2011). On graph-based name disambiguation. Journal of Data and Information Quality, 2(2), 10.

    Article  Google Scholar 

  • Ferreira, A. A., Goncalves, M. A., & Laender, A. H. (2012). A brief survey of automatic methods for author name disambiguation. ACM SIGMOD Record, 41(2), 15–26.

    Article  Google Scholar 

  • Ferreira, A. A., Machado, T. M., & Goncalves, M. A. (2012). Improving author name disambiguation with user relevance feedback. Journal of Information and Data Management, 3(3), 332–347.

    Google Scholar 

  • Ferreira, A. A., Veloso, A., Goncalves, M. A., & Laender, A. H. (2010). Effective self-training author name disambiguation in scholarly digital libraries. In Proceedings of the 10th ACM/IEEE-CS joint conference on digital libraries (pp. 39–48).

  • Gurney, T., Horlings, E., & Van Den Besselaar, P. (2012). Author disambiguation using multi-aspect similarity indicators. Scientometrics, 91(2), 435–449.

    Article  Google Scholar 

  • Han, H., Giles, C. L., & Hong, Y. Z. (2004). Two supervised learning approaches for name disambiguation in author citations. In Proceedings of the 4th ACM/IEEE-CS joint conference on digital librarie (pp. 296–305).

  • Han, H., Zhang, H., & Giles, C. L. (2005). Name disambiguation in author citations using a k-way spectral clustering method. In Proceedings of the 5th ACM/IEEE-CS joint conference on digital libraries (pp. 334–343).

  • Huang, J., & Seyda Ertekin, C. L. G. (2006). Efficient name disambiguation for large scale databases. In Proceedings of the 10th European conference on principles and practice of knowledge discovery in databases (pp. 536–544).

  • Kalashnikov, D. V., & Mehrotra, S. (2006). Domain-independent data cleaning via analysis of entity relationship graph. ACM Transactions on Database System, 31(2), 716–767.

    Article  Google Scholar 

  • Kang, I. S., Na, S. H., Lee, S., Jung, H., Kim, P., Sung, W. K., et al. (2009). On co-authorship for author disambiguation. Information Processing & Management, 45(1), 84–97.

    Article  Google Scholar 

  • Lalmas, M., & Ruthven, I. (1998). Representing and retrieving structured documents using the Dempster–Shafer theory of evidence: Modelling and evaluation. Journal of Documentation, 54(5), 529–565.

    Article  Google Scholar 

  • Lapidot, I. (2002). Self-organizing-maps with BIC for speaker clustering. Martigny, IDIAP Research Institute, Switzerland: Technical report.

    Google Scholar 

  • Levin, M., Krawczyk, S., Bethard, S., & Jurafsky, D. (2012). Citation-based bootstrapping for large-scale author disambiguation. Journal of the Association for Information Science and Technology, 63(5), 1030–1047.

    Article  Google Scholar 

  • Lucas, P., & Van Der Gaag, L. (1991). Principles of expert systems. Chicago: Addison-Wesley Longman Publishing Co., Inc.

    MATH  Google Scholar 

  • McRae-Spencer, D. M., & Shadbolt, N. R. (2006). Also by the same author: AKTiveAuthor, a citation graph approach to name disambiguation. In Proceedings of the 6th ACM/IEEE-CS joint conference on digital libraries (pp. 53–54).

  • Milojevic, S. (2013). Accuracy of simple, initials-based methods for author name disambiguation. Journal of Informetrics, 7(4), 767–773.

    Article  Google Scholar 

  • Moreira, C., & Wichert, A. (2013). Finding academic experts on a multisensor approach using Shannon’s entropy. Expert Systems Applications, 40(14), 5740–5754.

    Article  Google Scholar 

  • Pereira, D. A., Ribeiro, B. N., Ziviani, N., Alberto, H. F., Goncalves, A. M., & Ferreira, A. A. (2009). Using web information for author name disambiguation. In Proceedings of the 9th ACM/IEEE joint conference on digital libraries (pp. 49–58).

  • Rijsbergen, C. J. V. (1979). Information retrieval (2nd ed.). London: Butterworths.

    Google Scholar 

  • Ruthven, I., & Lalmas, M. (2002). Using Dempster–Shafer’s theory of evidence to combine aspects of information use. Journal of Intelligent Information Systems, 19(3), 267–301.

    Article  Google Scholar 

  • Shafer, G. (1976). A mathematical theory of evidence (Vol. 1). Princeton: Princeton University Press.

    MATH  Google Scholar 

  • Smalheiser, N. R., & Torvik, V. I. (2009). Author name disambiguation. Annual Review of Information Science and Technology, 43(1), 1–43.

    Article  Google Scholar 

  • Song, Y., Huang, J., Councill, I. G., Li, J., & Giles, C. L. (2007). Efficient topic-based unsupervised name disambiguation. In Proceedings of the 7th ACM/IEEE joint conference on digital libraries (pp. 342–352).

  • Strotmann, A., & Zhao, D. (2012). Author name disambiguation: What difference does it make in author-based citation analysis? Journal of the Association for Information Science and Technology , 63(9), 1820–1833.

    Article  Google Scholar 

  • Strotmann, A., Zhao, D., & Bubela, T. (2009). Author name disambiguation for collaboration network analysis and visualization. Journal of American Society for Information Science technology, 46, 1–20.

    Google Scholar 

  • Tan, Y. F., Kan, M. Y., & Lee, D. W. (2006). Search engine driven author disambiguation. In Proceedings of the 6th ACM/IEEE joint conference on digital libraries (pp. 314–315).

  • Tang, J., Fong, A. C. M., Wang, B., & Zhang, J. (2012). A unified probabilistic framework for name disambiguation in digital library. IEEE Transactions on Knowledge and Data Engineering , 24(6), 975–987.

    Article  Google Scholar 

  • Torvik, V. I., Weeber, M., Swanson, D. R., & Smalheiser, N. R. (2005). A probabilistic similarity metric for Medline records: A model for author name disambiguation. Journal of the American Society for Information Science and Technology, 56 (2), 140–158.

    Article  Google Scholar 

  • Treeratpituk, P., & Giles, C. L. (2009). Disambiguating authors in academic publications using random forests. In Proceedings of the 9th ACM/IEEE-CS joint conference on digital libraries (pp. 39–48).

  • Velden, T. A., Haque, A. U., & Lagoze, C. (2011). Resolving author name homonymy to improve resolution of structures in co-author networks. In Proceedings of the 11th ACM/IEEE-CS joint conference on digital libraries (pp. 241–250).

  • Wang, X., Tang, J., Cheng, H., & Yu, P. S. (2011). ADANA: Active name disambiguation. In Proceedings of the IEEE 11th international conference on data mining (pp. 794–803).

  • Wu, J., & Ding, X. (2013). Author name disambiguation in scientific collaboration and mobility cases. Scientometrics, 96(3), 683–697.

    Article  MathSciNet  Google Scholar 

  • Wu, H., Pei, Y. J., & Li, B. (2012). Scholar search-oriented author disambiguation. In Proceedings of the 9th international conference on fuzzy systems and knowledge discovery (pp. 1166–1170).

  • Wu, H., Pei, Y. J., & Yu, J. (2009). Detecting academic experts by topic-sensitive link analysis. Frontiers of Computer Science in China, 3(4), 445–456.

    Article  Google Scholar 

  • Yang, K. H., Peng, H. T., Jiang, J. Y., Lee, H. M., & Ho, J. H. (2008). Author name disambiguation for citations using topic and web correlation. In Proceedings of the 12th European conference on research and advanced technology for digital libraries (pp. 185–196).

  • Yin, X., Han, J., & Yu, P. S. (2007). Object distinction: Distinguishing objects with identical names. In Proceedings of IEEE the 23rd international conference on data engineering (pp. 1242–1246).

  • Yu, Z., Tian, Y., & Xi, B. (2005). Dempster–Shafer evidence theory of information fusion based on info-evolutionary value for e-business with continuous improvement. In Proceedings of IEEE international conference on e-Business engineering (pp. 586–590).

Download references

Acknowledgments

This work is supported by the Scientific Research Project of Yunnan University (2010YB024) and the Applied Basic Research Project of Yunnan Province (2013FB009). This work of Jun He is supported by the National Natural Science Foundation of China (61203273). We are grateful to anonymous reviewers for their useful comments and suggestions which contribute to substantially improving this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hao Wu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, H., Li, B., Pei, Y. et al. Unsupervised author disambiguation using Dempster–Shafer theory. Scientometrics 101, 1955–1972 (2014). https://doi.org/10.1007/s11192-014-1283-x

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-014-1283-x

Keywords

Navigation