Towards Effective Author Name Disambiguation by Hybrid Attention

Zhou, Qian; Chen, Wei; Zhao, Peng-Peng; Liu, An; Xu, Jia-Jie; Qu, Jian-Feng; Zhao, Lei

doi:10.1007/s11390-023-2070-z

Towards Effective Author Name Disambiguation by Hybrid Attention

Regular Paper
Published: 20 September 2024

Volume 39, pages 929–950, (2024)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Qian Zhou (周乾)¹,
Wei Chen (陈伟)¹,
Peng-Peng Zhao (赵朋朋)¹,
An Liu (刘安)¹,
Jia-Jie Xu (许佳捷)¹,
Jian-Feng Qu (瞿剑峰)¹ &
…
Lei Zhao (赵雷)¹

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Author name disambiguation (AND) is a central task in academic search, which has received more attention recently accompanied by the increase of authors and academic publications. To tackle the AND problem, existing studies have proposed various approaches based on different types of information, such as raw document features (e.g., co-authors, titles, and keywords), the fusion feature (e.g., a hybrid publication embedding based on multiple raw document features), the local structural information (e.g., a publication’s neighborhood information on a graph), and the global structural information (e.g., interactive information between a node and others on a graph). However, there has been no work taking all the above-mentioned information into account and taking full advantage of the contributions of each raw document feature for the AND problem so far. To fill the gap, we propose a novel framework named EAND (Towards Effective Author Name Disambiguation by Hybrid Attention). Specifically, we design a novel feature extraction model, which consists of three hybrid attention mechanism layers, to extract key information from the global structural information and the local structural information that are generated from six similarity graphs constructed based on different similarity coefficients, raw document features, and the fusion feature. Each hybrid attention mechanism layer contains three key modules: a local structural perception, a global structural perception, and a feature extractor. Additionally, the mean absolute error function in the joint loss function is used to introduce the structural information loss of the vector space. Experimental results on two real-world datasets demonstrate that EAND achieves superior performance, outperforming state-of-the-art methods by at least +2.74% in terms of the micro-F1 score and +3.31% in terms of the macro-F1 score.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Gupta S, Duhan N, Bansal P. An approach for focused crawler to harvest digital academic documents in online digital libraries. International Journal of Information Retrieval Research, 2019, 9(3): 23–47. DOI: https://doi.org/10.4018/IJIRR.2019070103.
Article Google Scholar
Chikazawa Y, Katsurai M, Ohmukai I. Multilingual author matching across different academic databases: A case study on KAKEN, DBLP, and PubMed. Scientometrics, 2021, 126(3): 2311–2327. DOI: https://doi.org/10.1007/s11192-020-03861-3.
Article Google Scholar
Tang J, Zhang J, Yao L M, Li J Z, Zhang L, Su Z. Arnet-Miner: Extraction and mining of academic social networks. In Proc. the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2008, pp.990–998. DOI: https://doi.org/10.1145/1401890.1402008.
Chapter Google Scholar
Ferreira A A, Gonçalves M A, Laender A H F. Automatic Disambiguation of Author Names in Bibliographic Repositories. Springer, 2020. DOI: https://doi.org/10.1007/978-3-031-02322-4.
Book Google Scholar
Martín-Martín A, Thelwall M, Orduna-Malea E, López-Cózar E D. Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitations’ COCI: A multidisciplinary comparison of coverage via citations. Scientometrics, 2021, 126(1): 871–906. DOI: https://doi.org/10.1007/s11192-020-03690-4.
Article Google Scholar
Yin X X, Han J W, Yu P S. Object distinction: Distinguishing objects with identical names. In Proc. the 23rd International Conference on Data Engineering, Apr. 2007, pp.1242–1246. DOI: https://doi.org/10.1109/ICDE.2007.368983.
Google Scholar
Li X, Morie P, Roth D. Identification and tracing of ambiguous names: Discriminative and generative approaches. In Proc. the 19th National Conference on Artificial Intelligence, the 16th Conference on Innovative Applications of Artificial Intelligence, Jul. 2004, pp.419–424.
Google Scholar
Pooja K M, Mondal S, Chandra J. A graph combination with edge pruning-based approach for author name disambiguation. Journal of the Association for Information Science and Technology, 2020, 71(1): 69–83. DOI: https://doi.org/10.1002/asi.24212.
Article Google Scholar
Ma Y Y, Wu Y L, Lu C Q. A graph-based author name disambiguation method and analysis via information theory. Entropy, 2020, 22(4): 416. DOI: https://doi.org/10.3390/e22040416.
Article MathSciNet Google Scholar
Zhang L Z, Ban Z J. Author name disambiguation based on rule and graph model. In Proc. the 9th CCF International Conference on Natural Language Processing and Chinese Computing, Oct. 2020, pp.617–628. DOI: https://doi.org/10.1007/978-3-030-60450-9_49.
Google Scholar
Zhang Y T, Zhang F J, Yao P R, Tang J. Name disambiguation in AMiner: Clustering, maintenance, and human in the loop. In Proc. the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Aug. 2018, pp.1002–1011. DOI: https://doi.org/10.1145/3219819.3219859.
Chapter Google Scholar
Kim K, Rohatgi S, Giles C L. Hybrid deep pairwise classification for author name disambiguation. In Proc. the 28th ACM International Conference on Information and Knowledge Management, Nov. 2019, pp.2369–2372. DOI: https://doi.org/10.1145/3357384.3358153.
Google Scholar
Jhawar K, Sanyal D K, Chattopadhyay S, Bhowmick P K, Das P P. Author name disambiguation in PubMed using ensemble-based classification algorithms. In Proc. the 2020 ACM/IEEE Joint Conference on Digital Libraries, Aug. 2020, pp.469–470. DOI: https://doi.org/10.1145/3383583.3398568.
Chapter Google Scholar
Han H, Zha H, Giles C L. Name disambiguation in author citations using a K-way spectral clustering method. In Proc. the 5th ACM/IEEE-CS Joint Conference on Digital Libraries, Jun. 2005, pp.334–343. DOI: https://doi.org/10.1145/1065385.1065462.
Chapter Google Scholar
Louppe G, Al-Natsheh H T, Susik M, Maguire E J. Ethnicity sensitive author disambiguation using semi-supervised learning. In Proc. the 7th International Conference on Knowledge Engineering and Semantic Web, Sept. 2016, pp.272–287. DOI: https://doi.org/10.1007/978-3-319-45880-9_21.
Chapter Google Scholar
Zhang B C, Hasan M A. Name disambiguation in anonymized graphs using network embedding. In Proc. the 2017 ACM on Conference on Information and Knowledge Management, Nov. 2017, pp.1239–1248. DOI: https://doi.org/10.1145/3132847.3132873.
Chapter Google Scholar
Wang H W, Wang R J, Wen C, Li S H, Jia Y T, Zhang W N, Wang X B. Author name disambiguation on heterogeneous information network with adversarial representation learning. In Proc. the 34th AAAI Conference on Artificial Intelligence, Feb. 2020, pp.238–245. DOI: https://doi.org/10.1609/aaai.v34i01.5356.
Google Scholar
Sun Q Y, Peng H, Li J X, Wang S Z, Dong X Y, Zhao L X, Yu P S, He L F. Pairwise learning for name disambiguation in large-scale heterogeneous academic networks. In Proc. the 2020 IEEE Int. Conf. Data Mining, Nov. 2020, pp.511–520. DOI: https://doi.org/10.1109/ICDM50108.2020.00060.
Google Scholar
Zhou Q, Chen W, Wang W Q, Xu J J, Zhao L. Multiple features driven author name disambiguation. In Proc. the 2021 IEEE Int. Conf. Web Services, Sept. 2021, pp.506–515. DOI: https://doi.org/10.1109/ICWS53863.2021.00071.
Google Scholar
Santana A F, Gonçalves M A, Laender A H F, Ferreira A A. On the combination of domain-specific heuristics for author name disambiguation: The nearest cluster method. International Journal on Digital Libraries, 2015, 16(3): 229–246. DOI: https://doi.org/10.1007/s00799-015-0158-y.
Article Google Scholar
Kim J, Owen-Smith J. ORCID-linked labeled data for evaluating author name disambiguation at scale. Scientometrics, 2021, 126(3): 2057–2083. DOI: https://doi.org/10.1007/s11192-020-03826-6.
Article Google Scholar
Godoi T A, da S Torres R, Carvalho A M B R, Gonçalves M A, Ferreira A A, Fan W G, Fox E A. A relevance feedback approach for the author name disambiguation problem. In Proc. the 13th ACM/IEEE-CS Joint Conference on Digital Libraries, Jul. 2013, pp.209–218. DOI: https://doi.org/10.1145/2467696.2467709.
Chapter Google Scholar
Xiao Z Y, Zhang Y T, Chen B, Liu X Z, Tang J. A framework for constructing a huge name disambiguation dataset: Algorithms, visualization and human collaboration. arXiv: 2007.02086, 2020. https://arxiv.org/abs/2007.02086, Jun. 2024.
Google Scholar
Perozzi B, Al-Rfou R, Skiena S. DeepWalk: Online learning of social representations. In Proc. the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2014, pp.701–710. DOI: https://doi.org/10.1145/2623330.2623732.
Chapter Google Scholar
He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.770–778. DOI: https://doi.org/10.1109/CVPR.2016.90.
Google Scholar
Chen B, Zhang J, Tang J, Cai L F, Wang Z Y, Zhao S, Chen H, Li C P. CONNA: Addressing name disambiguation on the fly. IEEE Trans. Knowledge and Data Engineering, 2022, 34(7): 3139–3152. DOI: https://doi.org/10.1109/TKDE.2020.3021256.
Article Google Scholar
Cota R G, Ferreira A A, Nascimento C, Gonçalves M A, Laender A H F. An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. Journal of the American Society for Information Science and Technology, 2010, 61(9): 1853–1870. DOI: https://doi.org/10.1002/asi.21363.
Article Google Scholar
Han H, Giles C L, Zha H Y, Li C, Tsioutsiouliklis K. Two supervised learning approaches for name disambiguation in author citations. In Proc. the 4th ACM/IEEE-CS Joint Conference on Digital Libraries, Jun. 2004, pp.296–305. DOI: https://doi.org/10.1145/996350.996419.
Chapter Google Scholar
Yoshida M, Ikeda M, Ono S, Sato I, Nakagawa H. Person name disambiguation by bootstrapping. In Proc. the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 2010, pp.10–17. DOI: https://doi.org/10.1145/1835449.1835454.
Google Scholar
Müller M C. Semantic author name disambiguation with word embeddings. In Proc. the 21st International Conference on Theory and Practice of Digital Libraries, Sept. 2017, pp.300–311. DOI: https://doi.org/10.1007/978-3-319-67008-9_24.
Google Scholar
Fan X M, Wang J Y, Pu X, Zhou L Z, Lv B. On graph-based name disambiguation. ACM Journal of Data and Information Quality, 2011, 2 (2): Article No. 10. DOI: https://doi.org/10.1145/1891879.1891883.
Tang J, Fong A C M, Wang B, Zhang J. A unified probabilistic framework for name disambiguation in digital library. IEEE Trans. Knowledge and Data Engineering, 2012, 24(6): 975–987. DOI: https://doi.org/10.1109/TKDE.2011.13.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Soochow University, Suzhou, 215006, China
Qian Zhou (周乾), Wei Chen (陈伟), Peng-Peng Zhao (赵朋朋), An Liu (刘安), Jia-Jie Xu (许佳捷), Jian-Feng Qu (瞿剑峰) & Lei Zhao (赵雷)

Authors

Qian Zhou (周乾)
View author publications
You can also search for this author in PubMed Google Scholar
Wei Chen (陈伟)
View author publications
You can also search for this author in PubMed Google Scholar
Peng-Peng Zhao (赵朋朋)
View author publications
You can also search for this author in PubMed Google Scholar
An Liu (刘安)
View author publications
You can also search for this author in PubMed Google Scholar
Jia-Jie Xu (许佳捷)
View author publications
You can also search for this author in PubMed Google Scholar
Jian-Feng Qu (瞿剑峰)
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zhao (赵雷)
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Wei Chen (陈伟) or Lei Zhao (赵雷).

Ethics declarations

Conflict of Interest The authors declare that they have no conflict of interest.

Additional information

A preliminary version of the paper was published in the proceedings of ICWS 2021.

This work was supported by the Major Program of the Natural Science Foundation of Jiangsu Higher Education Institutions of China under Grant Nos. 19KJA610002 and 19KJB520050, and the National Natural Science Foundation of China under Grant No. 61902270.

Chen Wei is the principal investigator of the second and third funding projects; Zhao Lei is the designer of the research framework and also the principal investigator of the first funding project.

Qian Zhou received his M.S. degree in computer science and technology from Soochow University, Suzhou, in 2022. Currently, he is a research assistant at the School of Computer Science and Technology, Soochow University, Suzhou. His current research interests mainly include data mining, deep learning, and natural language processing.

Wei Chen is currently an associate professor in the School of Computer Science and Technology at Soochow University, Suzhou. He received his Ph.D. degree in computer science from Soochow University, Suzhou, in 2018. His research interests include heterogeneous information network analysis, cross-platform linkage and recommendation, spatio-temporal database, and knowledge graph embedding and refinement.

Peng-Peng Zhao received his Ph.D. degree in computer science from Soochow University, Suzhou, in 2008. He is a professor at the School of Computer Science and Technology at Soochow University, Suzhou. His current research interests include data mining, deep learning, big data analysis, and recommender systems.

An Liu is a professor at the School of Computer Science and Technology, Soochow University, Suzhou. He received his Ph.D. degree in computer science from both City University of Hong Kong, Hong Kong, and University of Science and Technology of China, Hefei, in 2009. His research interests include security, privacy, trust in emerging applications, cloud computing, and services computing.

Jia-Jie Xu is an associate professor at the School of Computer Science and Technology, Soochow University, Suzhou. He got his Ph.D. and M.S. degrees from the Swinburne University of Technology, Melbourne, and the University of Queensland, Brisbane, in 2011 and 2006, respectively. His research interests mainly include spatio-temporal database systems, big data analytics, and workflow systems.

Jian-Feng Qu is a lecturer at the School of Computer Science and Technology, Soochow University, Suzhou. He received his B.S., M.S., and Ph.D. degrees in computer science from Jilin University, Changchun, in 2013, 2016 and 2019, respectively. His research interests include information extraction, data mining, natural language processing, and deep learning.

Lei Zhao is a professor at the School of Computer Science and Technology, Soochow University, Suzhou. He received his Ph.D. degree in computer science from Soochow University, Suzhou, in 2006. His recent research is to analyze large graph databases in an effective, efficient, and secure way.

Electronic supplementary material