Skip to main content
Log in

Learning semantic and relationship joint embedding for author name disambiguation

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Author name disambiguation is an important research topic in the academic information retrieval community. Existing methods rely either on feature engineering on rich attributes information or on relationship information to obtain documents’ similarity, but seldom consider the complementarity and the correlation between them. The feature engineering on attributes, especially on rich text information, could capture the global semantic concepts, while the relationship information could encode local structural proximity in multiple academic networks. To bridge the gap between semantic and relationship information in author name disambiguation, this paper presents a joint representation learning approach, which could encode both semantic and relationship information into a common low dimensional space. Specifically, the proposed method consists of four modules: (1) semantic embedding module; (2) relationship embedding module; (3) semantic and relationship joint embedding module; and (4) clustering module. Experimental results demonstrate that the proposed joint representation learning approach consistently outperforms the state-of-the-art methods on three benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Fu Y, Zhu L, Han H (2016) A survey of name disambiguation. Technol Intell Eng 2(1):053–058

    Google Scholar 

  2. Cen L, Dragut E, Si L, Ouzzani M (2013) Author disambiguation by hierarchical agglomerative clustering with adaptive stopping criterion. In: Proceedings of SIGIR, pp 741–744

  3. Han H, Giles L, Zha H, Li C, Tsioutsiouliklis K (2014) Two supervised learning approaches for name disambiguation in author citations. In: Proceedings of JCDL

  4. Zhang B, Dundar M, Hasan M (2016) Bayesian non-exhaustive classification a case study: online name disambiguation using temporal record streams. In: Proceedings of CIKM, pp 1341–1350

  5. Zhang B, Saha T, Hasan M (2014) Name disambiguation from link data in a collaboration graph. In: Proceedings of ASNAM, pp 8–84

  6. Zhang D, Tang J, Li J, Wang K (2007) A constraintbased probabilistic framework for name disambiguation. In: Proceedings of CIKM, 10191022

  7. Pucktada T, Lee G (2009) Disambiguating authors in academic publications using random forests. In: Proceedings of JCDL, pp 39–48

  8. Wang X, Tang J, Cheng H, Yu P (2011) ADANA: active name disambiguation. In: International Conference on Data Mining (ICDM), pp 794–803

  9. Zhang Y, Zhang F, Yao P, Tang J (2018) Name disambiguation in AMiner: clustering, maintenance, and human in the loop. In: Proceedings of SIGKDD, pp 1002–1011

  10. Zhang Y, Zhang F, Yao P, Tang J (2018) Name disambiguation in AMiner: clustering, maintenance, and human in the loop. In: Proceedings of SIGKDD, pp 1002-1011

  11. Zhang B, Hasan M (2017) Name disambiguation in anonymized graphs using network embedding. In: Proceedings of CIKM, New York, pp 1239-1248

  12. Qian Y, Zheng Q, Sakai T, Ye J, Liu J (2015) Dynamic author name disambiguation for growing digital libraries. Inf Retr J 18(5):379–412

    Article  Google Scholar 

  13. Han H, Yao C, Fu Y, Yu Y, Zhang Y, Xu S (2017) Semantic fingerprints-based author name disambiguation in Chinese documents. Scientometrics 111:1879–1896

    Article  Google Scholar 

  14. Silva J, Silva F (2017) Feature extraction for the author name disambiguation problem in a bibliographic database. In: Proceedings of the SAC, pp 783-789

  15. Zhang H, Guo H, Wang X, Ji Y, Wu QJ (2020) Clothescounter: a framework for star-oriented clothes mining from videos. Neurocomputing 377:38–48

    Article  Google Scholar 

  16. Zhou Q, Liu Y, Wei Y, Wang W, Wang B, Wu S (2018) dirichlet process mixtures model based on variational inference for Chinese person name disambiguation. In: International Conference on Computing and Data Engineering (ICDE), pp 6-10

  17. Gonçalves A, Laender M, Ferreira A, Anderson A (2015) On the combination of domain-specific heuristics for author name disambiguation: the nearest cluster method. Int J Dig Libr 16:229–246

    Article  Google Scholar 

  18. Fan X, Wang J, Pu X et al (2011) On graph-based name disambiguation. J Data Inf Qual 2(2):10

    Google Scholar 

  19. Shin D, Kim T, Choi J et al (2014) Author name disambiguation using a graph model with node splitting and merging based on bibliographic information. Scientometrics 100(1):15–50

    Article  Google Scholar 

  20. Kim K, Giles C (2016) Financial entity record linkage with random forests. In: Proceedings of the Second International Workshop on data science for macro-modeling, article 13, 2 pages

  21. Saha T, Zhang B, Hasan M (2015) Name disambiguation from link data in a collaboration graph using temporal and topological features. Soc Netw Anal Min 5(1):1–14

    Article  Google Scholar 

  22. D’Angelo C, Giuffrida C, Abramo G (2014) A heuristic approach to author name disambiguation in bibliometrics databases for large-scale research assessments. J Assoc Inf Sci Technol 62(2):257–269

    Article  Google Scholar 

  23. Cetoli A, Akbari M, Bragaglia S, O’Harney A, Sloan M (2018) Named entity disambiguation using deep learning on graphs. arXiv preprint arXiv:1810.09164

  24. Huang D, Wang J (2017) An approach on Chinese microblog entity linking combining baidu encyclopaedia and word2vec. Proc Comput Sci 111:37–45

    Article  Google Scholar 

  25. Zhu W, Zhang W, Li G, et al (2016) A study of damp-heat syndrome classification using Word2vec and TF-IDF. In: Proceedings of BIBM, pp 1415-1420

  26. Wang C, Chakrabarti K, Cheng T, et al (2012) Targeted disambiguation of ad-hoc, homogeneous sets of named entities. In: Proceedings of WWW, pp 719-728

  27. Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering. In: Proceedings of CVPR, pp 815-823

  28. Elmacioglu E, Tan Y, Yan S, et al (2017) Psnus: Web people name disambiguation by simple clustering with rich features. In: Proceedings of SemEval, pp 268-271

  29. Xu J, Shen S, Li D, et al (2018) A network-embedding based method for author disambiguation. In: Proceedings of ICKM, pp 1735-1738

  30. Perozzi B, Al-Rfou R, Skiena S (2014) DeepWalk: online learning of social representations. arXiv preprint arXiv:1403.6652

  31. Tang J, Qu M, Wang M, et al (2015) Line: Large-scale information network embedding. In: Proceedings of WWW, 1067-1077

  32. Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. In: Proceedings of SIGKDD, pp 855-864

  33. Yang C, Liu Z, Zhao D, et al (2015) Network representation learning with rich text information. In: Proceedings of IJCAI

  34. Fu T, Lee W, Lei Z (2017) Hin2vec: Explore meta-paths in heterogeneous information networks for representation learning. In: Proceedings of CIKM, pp 1797-1806

  35. Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(2579–2605):85

    MATH  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant number 61702031 and the Fundamental Research Funds for the Central Universities under grant number 2020JBM077. The authors would like to thank the editor and reviewers for the valuable comments and constructive suggestions to improve the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peng Bao.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiong, B., Bao, P. & Wu, Y. Learning semantic and relationship joint embedding for author name disambiguation. Neural Comput & Applic 33, 1987–1998 (2021). https://doi.org/10.1007/s00521-020-05088-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-05088-y

Keywords

Navigation