Skip to main content
Log in

A Heuristic Approach to Solve Author Name Ambiguity Using Minimum Bibliographic Evidences

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

This article proposed a method to solve the author’s name ambiguity problem using minimum available bibliographic evidence. Existing models are unable to solve many of the cases due to the unavailability of required evidence and features for resolving the conflict. Most of the works available in the literature mitigate the issue using the features such as author addresses, email-id, homepage, co-authors, etc. However, considering co-author as a feature still may have ambiguity as the co-author itself is an author. The proposed work attempts to resolve the issue with minimum available bibliographic information like the author’s affiliation and publication year. A two-level heuristic method is proposed in this paper with the aforesaid minimum available features. The readily available disambiguate details of 100 authors from the ArnetMiner data-set are used to set the threshold of this proposed heuristic. The experimental analysis of proposed heuristics is performed on 20 authors of publicly available Microsoft Academic Graph (MAG) data-set. The result of this proposed heuristic outperforms when compared with other baseline approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data Availability

The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to restrictions, e.g., their containing information that could compromise the privacy of research participants.

Notes

  1. https://aminer.org/.

  2. https://aminer.org/disambiguation.

References

  1. Hussain I, Asghar S. Author name disambiguation by exploiting graph structural clustering and hybrid similarity. Arab J Sci Eng. 2018;43(12):7421–37.

    Article  Google Scholar 

  2. Shin D, Kim T, Choi J, Kim J. Author name disambiguation using a graph model with node splitting and merging based on bibliographic information. Scientometrics. 2014;100(1):15–50.

    Article  Google Scholar 

  3. Huynh T, Hoang K, Do T, Huynh D. Vietnamese author name disambiguation for integrating publications from heterogeneous sources. In: Asian Conference on Intelligent Information and Database Systems, 2013;226–235. Springer

  4. Liu Y, Li W, Huang Z, Fang Q. A fast method based on multiple clustering for name disambiguation in bibliographic citations. J Assoc Inform Sci Technol. 2015;66(3):634–44.

    Article  Google Scholar 

  5. Fan X, Wang J, Pu X, Zhou L, Lv B. On graph-based name disambiguation. J Data Inform Quality (JDIQ). 2011;2(2):1–23.

    Article  Google Scholar 

  6. Shoaib M, Daud A, Khiyal MSH. Improving similarity measures for publications with special focus on author name disambiguation. Arab J Sci Eng. 2015;40(6):1591–605.

    Article  MathSciNet  MATH  Google Scholar 

  7. Hazra R, Saha A, Deb SB, Mitra D. An efficient technique for author name disambiguation. In: 2016 IEEE International Conference on Current Trends in Advanced Computing (ICCTAC), 2016;1–6. IEEE

  8. Pooja K, Mondal S, Chandra J. An unsupervised heuristic based approach for author name disambiguation. In: 2018 10th International Conference on Communication Systems & Networks (COMSNETS), 2018;540–542. IEEE

  9. Lee S, Lee GG. Exploring phrasal context and error correction heuristics in bootstrapping for geographic named entity annotation. Inform Syst. 2007;32(4):575–92.

    Article  Google Scholar 

  10. Waqas H, Qadir MA. Multilayer heuristics based clustering framework (mhcf) for author name disambiguation. Scientometrics. 2021;126(9):7637–78.

    Article  Google Scholar 

  11. Bhattacharya S. Discoveries of research genealogy from large-scale academic dataset: issues, challenges and application. Int J Comput Sci Eng. 2019;7:262–7.

    Google Scholar 

  12. Bhattacharya S, Banerjee A, Goswami A, Nandi S, Pradhan DK. Machine learning based approach for future prediction of authors in research academics. SN Comput Sci. 2023;4(3):306.

    Article  Google Scholar 

  13. Bhattacharya S, Banerjee A, Mazumder A, Nandi S. Impact of author indexing from the co-authorship relation. In: 2022 International Interdisciplinary Conference on Mathematics, Engineering and Science (MESIICON), 2022;1–6. IEEE

  14. Wang C, He X, Zhou A. Heel: exploratory entity linking for heterogeneous information networks. Knowl Inform Syst. 2020;62(2):485–506.

    Article  Google Scholar 

  15. Zhang Z, Yu B, Liu T, Wang D. Strong baselines for author name disambiguation with and without neural networks. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2020;369–381. Springer

  16. Luo D, Ma S, Yan Y, Hu C, Zhang X, Huai J-P. A collective approach to scholar name disambiguation. IEEE Transactions on Knowledge and Data Engineering 2020.

  17. Santana AF, Gonçalves MA, Laender AH, Ferreira AA. Incremental author name disambiguation by exploiting domain-specific heuristics. J Assoc Inform Sci Technol. 2017;68(4):931–45.

    Article  Google Scholar 

  18. Ma Y, Wu Y, Lu C. A graph-based author name disambiguation method and analysis via information theory. Entropy. 2020;22(4):416.

    Article  MathSciNet  Google Scholar 

  19. Ma X, Wang R, Zhang Y, Jiang C, Abbas H. A name disambiguation module for intelligent robotic consultant in industrial internet of things. Mech Syst Signal Process. 2020;136: 106413.

    Article  Google Scholar 

  20. Backes T, Dietze S. Lattice-based progressive author disambiguation. Inform Syst. 2022;109: 102056.

    Article  Google Scholar 

  21. Km P, Mondal S, Chandra J. A graph combination with edge pruning-based approach for author name disambiguation. J Assoc Inform Sci Technol. 2020;71(1):69–83.

    Article  Google Scholar 

  22. Zhang B, Al Hasan M. Name disambiguation in anonymized graphs using network embedding. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017;1239–1248. ACM

  23. Cai X, Wang N, Yang L, Mei X. Global-local neighborhood based network representation for citation recommendation. Applied Intelligence, 2022;1–18

  24. López-Robles J, Cobo M, Gutiérrez-Salcedo M, Martínez-Sánchez M, Gamboa-Rosales N, Herrera-Viedma E. 30th anniversary of applied intelligence: A combination of bibliometrics and thematic analysis using scimat. Appl Intell. 2021;51(9):6547–68.

    Article  Google Scholar 

  25. Xiao Z, Zhang Y, Chen B, Liu X, Tang J. A framework for constructing a huge name disambiguation dataset: algorithms, visualization and human collaboration. arXiv preprint arXiv:2007.02086 2020

  26. Gnoyke P, Matta K. Author name disambiguation by clustering based on deep learned pairwise similarities. no. May, 2020;0–12

  27. Kim J, Kim J, Owen-Smith J. Generating automatically labeled data for author name disambiguation: an iterative clustering method. Scientometrics. 2019;118(1):253–80.

    Article  Google Scholar 

  28. Tan H, Tian Y, Wang L, Lin G. Name disambiguation using meta clusters and clustering ensemble. J Intell Fuzzy Syst. 2020;38(2):1559–68.

    Article  Google Scholar 

  29. YAMANI Z, NURMAINI S, SARI WK. Author matching using string similarities and deep neural networks. In: Sriwijaya International Conference on Information Technology and Its Applications (SICONIAN 2019), 2020;474–479 . Atlantis Press

  30. Tran, H.N., Huynh, T., Do, T.: Author name disambiguation by using deep neural network. In: Asian Conference on Intelligent Information and Database Systems, 2014;123–132 Springer

  31. Han D, Liu S, Hu Y, Wang B, Sun Y. Elm-based name disambiguation in bibliography. World Wide Web. 2015;18(2):253–63.

    Article  Google Scholar 

  32. Zhang J, Tang J. Name disambiguation in aminer. Sci China Inform Sci. 2020;64(4): 144101.

    Article  Google Scholar 

  33. Sun Q, Peng H, Li J, Wang S, Dong X, Zhao L, Yu PS, He L. Pairwise learning for name disambiguation in large-scale heterogeneous academic networks. arXiv preprint arXiv:2008.13099 2020.

  34. Chen G, Xiao L. Selecting publication keywords for domain analysis in bibliometrics: a comparison of three methods. J Inform. 2016;10(1):212–23.

    Article  Google Scholar 

  35. Kim J, Kim J, Owen-Smith J. Ethnicity-based name partitioning for author name disambiguation using supervised machine learning. Journal of the Association for Information Science and Technology 2021.

  36. Yu D, Xu Z, Fujita H. Bibliometric analysis on the evolution of applied intelligence. Appl Intell. 2019;49(2):449–62.

    Article  Google Scholar 

  37. Gutiérrez-Salcedo M, Martínez MÁ, Moral-Munoz JA, Herrera-Viedma E, Cobo MJ. Some bibliometric procedures for analyzing and evaluating research fields. Appl Intell. 2018;48(5):1275–87.

    Google Scholar 

  38. Pobiedina N, Ichise R. Citation count prediction as a link prediction problem. Appl Intell. 2016;44(2):252–68.

    Article  Google Scholar 

  39. Zhu J, Wu X, Lin X, Huang C, Fung GPC, Tang Y. A novel multiple layers name disambiguation framework for digital libraries using dynamic clustering. Scientometrics. 2018;114(3):781–94.

    Article  Google Scholar 

  40. Arif T, Ali R, Asger M. A multistage hierarchical method for author name disambiguation. Int J Inform Process. 2015;9(3):92–105.

    Google Scholar 

  41. Wang J, Berzins K, Hicks D, Melkers J, Xiao F, Pinheiro D. A boosted-trees method for name disambiguation. Scientometrics. 2012;93(2):391–411.

    Article  Google Scholar 

  42. Tang J, Fong AC, Wang B, Zhang J. A unified probabilistic framework for name disambiguation in digital library. IEEE Trans Knowl Data Eng. 2011;24(6):975–87.

    Article  Google Scholar 

  43. Zhang D, Tang J, Li J, Wang K. A constraint-based probabilistic framework for name disambiguation. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, 2007;1019–1022 . ACM

  44. Han H., Zha H, Giles CL. A model-based k-means algorithm for name disambiguation. In: International Semantic Web Conference 2003.

Download references

Funding

The authors have not disclosed any funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sovan Bhattacharya.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

https://aminer.org/disambiguation.

https://www.microsoft.com/en-us/research/project/microsoft-academic-graph/.

This article is part of the topical collection “Research Trends in Computational Intelligence” guest edited by Anshul Verma, Pradeepika Verma, Vivek Kumar Singh and S. Karthikeyan.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bhattacharya, S., Choudhury, P., Nandi, S. et al. A Heuristic Approach to Solve Author Name Ambiguity Using Minimum Bibliographic Evidences. SN COMPUT. SCI. 4, 733 (2023). https://doi.org/10.1007/s42979-023-02176-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-023-02176-3

Keywords

Navigation