MORE: Toward Improving Author Name Disambiguation in Academic Knowledge Graphs

Gong, Jibing; Fang, Xiaohan; Peng, Jiquan; Zhao, Yi; Zhao, Jinye; Wang, Chenlong; Li, Yangyang; Zhang, Jingyi; Drew, Steve

doi:10.1007/s13042-022-01686-5

MORE: Toward Improving Author Name Disambiguation in Academic Knowledge Graphs

Original Article
Published: 28 November 2022

Volume 15, pages 37–50, (2024)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Jibing Gong^1,2,
Xiaohan Fang^1,2,
Jiquan Peng^1,2,
Yi Zhao ORCID: orcid.org/0000-0002-0919-867X^1,2,
Jinye Zhao^1,2,
Chenlong Wang^1,2,
Yangyang Li³,
Jingyi Zhang⁴ &
…
Steve Drew⁵

641 Accesses
Explore all metrics

Abstract

Author name disambiguation (AND) is a fundamental task in knowledge alignment for building a knowledge graph network or an online academic search system. Existing AND algorithms tend to cause over-splitting and over-merging problems of papers, severely jeopardizing the performance of downstream tasks. In this paper, we demonstrate the problem of paper over-splitting and over-merging when constructing an academic knowledge graph. To address the problems, we systematically investigate and propose a unified architecture, MORE, which utilizes LightGBM and HAC FOR paper clusteRing as well as HGAT for both cluster alignmEnt and knowledge graph representation learning. Specifically, we first propose a novel representation learning method which leverages OAG-BERT to learn paper entity embedding and utilizes SimCSE to regularizes pre-trained embedding anisotropic space. We then apply LightGBM to calculate the similarity matrix of papers through entity embedding. We also use hierarchical agglomerative clustering (HAC) for grouping clusters to alleviate over-merging. Finally, considering co-author relationships, we improve the HGAT model using hard-cross graph attention mechanism to generate semantic and structural embedding. Experimental results on two large real-world datasets show that our proposed method achieves 6%$\sim$16% improvement against the baseline models on F1-score.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning semantic and relationship joint embedding for author name disambiguation

Article 20 June 2020

Author Name Disambiguation in Heterogeneous Academic Networks

ANDI: a Joint Disambiguation Framework Integrating Author Name Disambiguation Goals

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Notes

References

Sanyal DK, Bhowmick PK, Das PP (2021) A review of author name disambiguation techniques for the pubmed bibliographic database. Journal of Information Science 47(2):227–254
Article Google Scholar
Yan H, Peng H, Li C, Li J, Wang L.(2020) Bibliographic name disambiguation with graph convolutional network. In: International Conference on Web Information Systems Engineering, pp. 538–551 . Springer
Pooja K, Mondal S, Chandra J (2021) Exploiting similarities across multiple dimensions for author name disambiguation. Scientometrics 126(9):7525–7560
Article Google Scholar
Xiong B, Bao P, Wu Y (2021) Learning semantic and relationship joint embedding for author name disambiguation. Neural Computing and Applications 33(6):1987–1998
Article Google Scholar
Kim J (2018) Evaluating author name disambiguation for digital libraries: A case of dblp. Scientometrics 116(3):1867–1886
Article Google Scholar
Schulz C, Mazloumian A, Petersen AM, Penner O, Helbing D (2014) Exploiting citation networks for large-scale author name disambiguation. EPJ Data Science 3:1–14
Article Google Scholar
Liu X, Yin D, Zheng J, Zhang X, Zhang P, Yang H, Dong Y, Tang J. (2022)Oag-bert: Towards a unified backbone language model for academic knowledge services. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 3418–3428
Friedman J.H.(2001) Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189–1232
Sun Q, Li J, Peng H, Wu J, Ning Y, Yu P.S, He L.(2021) Sugar: Subgraph neural network with reinforcement pooling and self-supervised mutual information mechanism. In: Proceedings of the Web Conference 2021, pp. 2081–2091
Liu, Y., Wan, Y., He, L., Peng, H., Philip, S.Y.: Kg-bart: Knowledge graph-augmented bart for generative commonsense reasoning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol.35, pp. 6418–6425 (2021)
Zhu, S., Li, J., Peng, H., Wang, S., He, L.: Adversarial directed graph embedding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 4741–4748 (2021)
Gong, J., Wang, S., Wang, J., Feng, W., Peng, H., Tang, J., Yu, P.S.: Attentional graph convolutional networks for knowledge concept recommendation in moocs in a heterogeneous view. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 79–88 (2020)
Peng H, Zhang R, Dou Y, Yang R, Zhang J, Yu PS (2021) Reinforced neighborhood selection guided multi-relational graph neural networks. ACM Transactions on Information Systems (TOIS) 40(4):1–46
Article Google Scholar
Louppe G, Al-Natsheh HT, Susik M, Maguire EJ (2016) Ethnicity sensitive author disambiguation using semi-supervised learning. In: Ngonga Ngomo A-C, Křemen P (eds) Knowledge Engineering and Semantic Web. Springer, Cham, pp 272–287
Chapter Google Scholar
Subramanian, S., King, D., Downey, D., Feldman, S.: S2and: A benchmark and evaluation system for author name disambiguation. In: 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 170–179 (2021). IEEE
Kim K, Rohatgi S, Giles C.L.(2019) Hybrid deep pairwise classification for author name disambiguation. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 2369–2372
Zhang, Y., Zhang, F., Yao, P., Tang, J.: Name disambiguation in aminer: Clustering, maintenance, and human in the loop. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1002–1011 (2018)
Ferreira, A.A., Silva, R., Gonçalves, M.A., Veloso, A., Laender, A.H.: Active associative sampling for author name disambiguation. In: Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 175–184 (2012)
Tang J, Fong AC, Wang B, Zhang J (2011) A unified probabilistic framework for name disambiguation in digital library. IEEE Transactions on Knowledge and Data Engineering 24(6):975–987
Article Google Scholar
Khabsa M, Treeratpituk P, Giles C.L.(2015) Online person name disambiguation with constraints. In: Proceedings of the 15th Acm/ieee-cs Joint Conference on Digital Libraries, pp. 37–46
D’Angelo CA, van Eck NJ (2020) Collecting large-scale publication data at the level of individual researchers: a practical proposal for author name disambiguation. Scientometrics 123(2):883–907
Article Google Scholar
Giles, C.L., Zha, H., Han, H.: Name disambiguation in author citations using a k-way spectral clustering method. In: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’05), pp. 334–343 (2005). IEEE
Müller M.-C.(2017) Semantic author name disambiguation with word embeddings. In: International Conference on Theory and Practice of Digital Libraries, pp. 300–311 . Springer
Louppe, G., Al-Natsheh, H.T., Susik, M., Maguire, E.J.: Ethnicity sensitive author disambiguation using semi-supervised learning. In: International Conference on Knowledge Engineering and the Semantic Web, pp. 272–287 (2016). Springer
Peng H, Wang H, Du B, Bhuiyan MZA, Ma H, Liu J, Wang L, Yang Z, Du L, Wang S, Yu PS (2020) Spatial temporal incidence dynamic graph neural networks for traffic flow forecasting. Information Sciences 521:277–290
He, Y., Song, Y., Li, J., Ji, C., Peng, J., Peng, H.: Hetespaceywalk: A heterogeneous spacey random walk for heterogeneous information network embedding. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 639–648 (2019)
Peng H, Yang R, Wang Z, Li J, He L, Philip SY, Zomaya AY, Ranjan R (2021) Lime: Low-cost and incremental learning for dynamic heterogeneous information networks. IEEE Transactions on Computers 71(3):628–642
Article Google Scholar
Zhang B, Al Hasan M.(2017) Name disambiguation in anonymized graphs using network embedding. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 1239–1248
Li, N., Zhu, R., Zhou, X., He, X., Cai, W., Gao, M., Zhou, A.: On disambiguating authors: Collaboration network reconstruction in a bottom-up manner. In: 2021 IEEE 37th International Conference on Data Engineering (ICDE), pp. 888–899 (2021). IEEE
Chen, B., Zhang, J., Tang, J., Cai, L., Wang, Z., Zhao, S., Chen, H., Li, C.: Conna: Addressing name disambiguation on the fly. IEEE Transactions on Knowledge and Data Engineering (2020)
Santini, C., Gesese, G.A., Peroni, S., Gangemi, A., Sack, H., Alam, M.: A knowledge graph embeddings based approach for author name disambiguation using literals. Scientometrics (2022)
Sun, Q., Peng, H., Li, J., Wang, S., Dong, X., Zhao, L., Yu, P.S., He, L.: Pairwise learning for name disambiguation in large-scale heterogeneous academic networks. In: 2020 IEEE International Conference on Data Mining (ICDM), pp. 511–520 (2020)
Zhang J, Tang J (2021) Name disambiguation in aminer. Science China-information sciences 64(4):10–1007
Article Google Scholar
Santana AF, Gonçalves MA, Laender AH, Ferreira AA (2017) Incremental author name disambiguation by exploiting domain-specific heuristics. Journal of the Association for Information Science and Technology 68(4):931–945
Article Google Scholar
Esperidião LVB, Ferreira AA, Laender AH, Gonçalves MA, Gomes DM, Tavares AI, de Assis GT (2014) Reducing fragmentation in incremental author name disambiguation. Journal of Information and Data Management 5(3):293–293
Google Scholar
Zhang L, Lu W, (2021)et al. Lagos-and: A large gold standard dataset for scholarly author name disambiguation. CoRR abs/2104.01821
Church KW (2017) Word2vec. Natural Language Engineering 23(1):155–162
Article Google Scholar
Lau J.H, Baldwin T.(2016) An empirical evaluation of doc2vec with practical insights into document embedding generation. In: Proceedings of the 1st Workshop on Representation Learning for NLP, pp. 78–86
Ulčar, M., Robnik-Šikonja, M.: High quality elmo embeddings for seven less-resourced languages. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 4731–4738 (2020)
Kenton, J.D.M.-W.C., Toutanova, L.K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp. 4171–4186 (2019)
Reimers, N., Gurevych, I.: Sentence-bert: Sentence embeddings using siamese bert-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3982–3992 (2019)
Beltagy, I., Lo, K., Cohan, A.: Scibert: A pretrained language model for scientific text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3615–3620 (2019)
Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P, Maschinot A, Liu C, Krishnan D (2020) Supervised contrastive learning. Advances in Neural Information Processing Systems 33:18661–18673
Google Scholar
Gao, T., Yao, X., Chen, D.: Simcse: Simple contrastive learning of sentence embeddings. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 6894–6910 (2021)
Wu, L., Li, J., Wang, Y., Meng, Q., Qin, T., Chen, W., Zhang, M., Liu, T.-Y., et al.: R-drop: regularized dropout for neural networks. Advances in Neural Information Processing Systems 34 (2021)
Liu C, Wang R, Liu J, Sun J, Huang F, Si L.(2021) Dialoguecse: Dialogue-based contrastive learning of sentence embeddings. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 2396–2406

Download references

Acknowledgements

This work was supported by National Key R &D Program of China through grant 2021YFB1714800, S &T Program of Hebei through grant 21340301D, Innovation Capability Improvement Plan Project of Hebei Province 22567626H and Hebei Natural Science Foundation of China Grant F2022203072.

Author information

Authors and Affiliations

School of Information Science and Engineering, Yanshan University, Qinhuangdao, China
Jibing Gong, Xiaohan Fang, Jiquan Peng, Yi Zhao, Jinye Zhao & Chenlong Wang
The Key Lab for Computer Virtual Technology and System Integration, The Key Laboratory for Software Engineering of Hebei Province, Beijing, China
Jibing Gong, Xiaohan Fang, Jiquan Peng, Yi Zhao, Jinye Zhao & Chenlong Wang
National Engineering Research Center for Risk Perception and Prevention, CAEIT, Beijing, China
Yangyang Li
School of Cyber Science and Technology, Beihang University, Beijing, China
Jingyi Zhang
Department of Electrical and Software Engineering, University of Calgary, Calgary, AB, Canada
Steve Drew

Authors

Jibing Gong
View author publications
You can also search for this author inPubMed Google Scholar
Xiaohan Fang
View author publications
You can also search for this author inPubMed Google Scholar
Jiquan Peng
View author publications
You can also search for this author inPubMed Google Scholar
Yi Zhao
View author publications
You can also search for this author inPubMed Google Scholar
Jinye Zhao
View author publications
You can also search for this author inPubMed Google Scholar
Chenlong Wang
View author publications
You can also search for this author inPubMed Google Scholar
Yangyang Li
View author publications
You can also search for this author inPubMed Google Scholar
Jingyi Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Steve Drew
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Yi Zhao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gong, J., Fang, X., Peng, J. et al. MORE: Toward Improving Author Name Disambiguation in Academic Knowledge Graphs. Int. J. Mach. Learn. & Cyber. 15, 37–50 (2024). https://doi.org/10.1007/s13042-022-01686-5

Download citation

Received: 16 June 2022
Accepted: 10 October 2022
Published: 28 November 2022
Issue Date: January 2024
DOI: https://doi.org/10.1007/s13042-022-01686-5

Keywords

Profiles

Yi Zhao View author profile

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MORE: Toward Improving Author Name Disambiguation in Academic Knowledge Graphs

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Learning semantic and relationship joint embedding for author name disambiguation

Author Name Disambiguation in Heterogeneous Academic Networks

ANDI: a Joint Disambiguation Framework Integrating Author Name Disambiguation Goals

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Profiles

Subscribe and save

Buy Now