Abstract
Names make up a large portion of queries in search engines, while the name ambiguity problem brings negative effect to the service quality of search engines. In digital academic systems, this problem refers to a large number of publications containing ambiguous author names. Name ambiguity derives from many people sharing identical names, or names may be abbreviated. Although some methods have been proposed in the decade, this problem is still not completely solved and there are many subproblems needing to be studied. Due to lack of information, it is a nontrivial task to distinguish ambiguous authors accurately relying on limited internal information only. In this paper, we focus on the cold-start disambiguation task with homonymous author names, i.e., distinguishing publications written by authors with identical names. We present a supervised framework named DND (abbreviation for Distributed Framework for Name Disambiguation) to solve the author disambiguation problem efficiently. DND utilizes accessible information and trains a robust function to measure similarities between publications, and then determines whether they belong to the same author. In traditional clustering-based approaches for author disambiguation, the number of clusters which is the amount of authors sharing the same name is hard to predict in advance, while DND transforms the clustering task to a linkage prediction task to avoid specifying the number of clusters. We validate the effectiveness of DND on two real-world datasets. The experimental results indicate that DND achieves a competitive performance compared with the baselines.



Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
pal Singh V, Kumar P (2020) Word sense disambiguation for Punjabi language using deep learning techniques. Neural Comput Appl 32:2963–2973
Jirak D, Biertimpel D, Kerzel M, Wermter S (2020) Solving visual object ambiguities when pointing: an unsupervised learning approach. Neural Comput Appl. https://doi.org/10.1007/s00521-020-05109-w
Wang Z, Zhang J, Feng J, Chen Z (2014) Knowledge graph embedding by translating on hyperplanes. In: Proceedings of AAAI conference on artificial intelligence, pp 1112–1119
Gao J, Tian L, Lv T, Wang J, Song B, Hu X (2019) Protein2vec: aligning multiple ppi networks with representation learning. IEEE/ACM Trans Comput Biol Bioinform 19(3):571–578
Zhang J, Philip SY (2015) Multiple anonymized social networks alignment. In: Proceedings of IEEE international conference on data mining. IEEE, pp 599–608
Zhang Y, Zhang F, Yao P, Tang J (2018) Name disambiguation in aminer: clustering, maintenance, and human in the loop. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1002–1011
Zhang B, Al Hasan M (2017) Name disambiguation in anonymized graphs using network embedding. In: Proceedings of ACM international conference on information and knowledge management. ACM, pp 1239–1248
Fan X, Wang J, Pu X, Zhou L, Lv B (2011) On graph-based name disambiguation. J Data Inf Qual (JDIQ) 2(2):10
Shen J, Xiao J, He X, Shang J, Sinha S, Han J (2018) Entity set search of scientific literature: an unsupervised ranking approach. In: Proceedings of ACM SIGIR conference on research and development in information retrieval. ACM, pp 565–574
Zwicklbauer S, Seifert C, Granitzer M (2016) Robust and collective entity disambiguation through semantic embeddings. In: Proceedings of ACM SIGIR conference on research and development in information retrieval. ACM, pp 425–434
Huang S, Yang B, Yan S, Rousseau R (2014) Institution name disambiguation for research assessment. Scientometrics 99(3):823–838
Kim J, Kim J, Owen-Smith J (2019) Generating automatically labeled data for author name disambiguation: an iterative clustering method. Scientometrics 118(1):253–280
Schulz J (2016) Using Monte Carlo simulations to assess the impact of author name disambiguation quality on different bibliometric analyses. Scientometrics 107(3):1283–1298
Yin D, Motohashi K, Dang J (2020) Large-scale name disambiguation of Chinese patent inventors (1985–2016). Scientometrics 122(2):765–790
Krizhevsky A, Sutskever I, Hinton G E (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of international conference on neural information processing systems. Curran Associates Inc., pp 1097–1105
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv1810.04805, pp 1–14
Singh M, Kumar R, Chana I (2020) Improving neural machine translation for low-resource Indian languages using rule-based feature extraction. Neural Comput Appl. https://doi.org/10.1007/s00521-020-04990-9
Teles G, Rodrigues JJPC, Saleem K, Kozlov S, Rabêlo RAL (2020) Machine learning and decision support system on credit scoring. Neural Comput Appl 32:9809–9826
Hou R, Kong Y, Cai B, Liu H (2020) Unstructured big data analysis algorithm and simulation of internet of things based on machine learning. Neural Comput Appl 32:5399–5407
Zhang Y, Wu J, Zhou C, Cai Z (2017) Instance cloned extreme learning machine. Pattern Recognit 68:52–65
Gurney T, Horlings E, Van Den Besselaar P (2012) Author disambiguation using multi-aspect similarity indicators. Scientometrics 91(2):435–449
Müller M-C (2018) On the contribution of word-level semantics to practical author name disambiguation. In: Proceedings of ACM/IEEE joint conference on digital libraries, pp 367–368
Yin D, Motohashi K (2018) Inventor name disambiguation with gradient boosting decision tree and inventor mobility in China (1985–2016). Technical report, Research Institute of Economy, Trade and Industry
Ju Y, Adams B, Janowicz K, Hu Y, Yan B, McKenzie G (2016)Things and strings: improving place name disambiguation from short texts by combining entity co-occurrence with topic modeling. In: Proceedings of European knowledge acquisition workshop. Springer, pp 353–367
Steorts RC, Ventura SL, Sadinle M, Fienberg SE (2014) A comparison of blocking methods for record linkage. In: Proceedings of international conference on privacy in statistical databases. Springer, pp 253–268
Yoshida M, Ikeda M, Ono S, Sato I, Nakagawa H (2010) Person name disambiguation by bootstrapping. In: Proceedings of ACM SIGIR international conference on research and development in information retrieval. ACM, pp 10–17
Zhang K, Zhu Y, Gao W, Xing Y, Zhou J (2018) An approach for named entity disambiguation with knowledge graph. In: Proceedings of international conference on audio, language and image processing. IEEE, pp 138–143
Qian Y, Hu Y, Cui J, Zheng Q, Nie Z (2011) Combining machine learning and human judgment in author disambiguation. In: Proceedings of ACM international conference on information and knowledge management. ACM, pp 1241–1246
Shen Q, Wu T, Yang H, Wu Y, Qu H, Cui W (2016) Nameclarifier: a visual analytics system for author name disambiguation. IEEE Trans Vis Comput Graph 23(1):141–150
Louppe G, Al-Natsheh HT, Susik M, Maguire EJ (2016) Ethnicity sensitive author disambiguation using semi-supervised learning. In: Proceedings of international conference on knowledge engineering and the semantic web. Springer, pp 272–287
Zhang B, Dundar M, Al Hasan M (2016) Bayesian non-exhaustive classification a case study: Online name disambiguation using temporal record streams. In: Proceedings of ACM international on conference on information and knowledge management. ACM, pp 1341–1350
Treeratpituk P, Giles CL (2009) Disambiguating authors in academic publications using random forests. In: Proceedings of ACM/IEEE joint conference on digital libraries. ACM, pp 39–48
Han H, Giles L, Zha H, Li C, Tsioutsiouliklis K (2004) Two supervised learning approaches for name disambiguation in author citations. In: Proceedings of ACM/IEEE joint conference on digital libraries. IEEE, pp 296–305
Pooja KM, Mondal S, Chandra J (2018) An unsupervised heuristic based approach for author name disambiguation. In: Proceedings of international conference on communication systems and networks. IEEE, pp 540–542
Kim J (2018) Evaluating author name disambiguation for digital libraries: a case of DBLP. Scientometrics 116(3):1867–1886
Zhu J, Wu X, Xueqin Lin, Huang C, Fung GPC, Tang Y (2018) A novel multiple layers name disambiguation framework for digital libraries using dynamic clustering. Scientometrics 114(3):781–794
Xiong B, Bao P, Wu Y (2020) Learning semantic and relationship joint embedding for author name disambiguation. Neural Comput Appl. https://doi.org/10.1007/s00521-020-05088-y
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of international conference on neural information processing systems. Curran Associates Inc., pp 3111–3119
Zhu J, Yang Y, Xie Q, Wang L, Hassan S-U (2014) Robust hybrid name disambiguation framework for large databases. Scientometrics 98(3):2255–2274
Han H, Yao C, Fu Y, Yu Y, Zhang Y, Xu S (2017) Semantic fingerprints-based author name disambiguation in chinese documents. Scientometrics 111(3):1879–1896
Tang J, Fong ACM, Wang B, Zhang J (2011) A unified probabilistic framework for name disambiguation in digital library. IEEE Trans Knowl Data Eng 24(6):975–987
Pelleg D, Moore AW et al (2000) X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of international conference on machine learning, vol 1, pp 727–734
Wu H, Li B, Pei Y, He J (2014a) Unsupervised author disambiguation using Dempster–Shafer theory. Scientometrics 101(3):1955–1972
Arif T, Ali R, Asger M (2014) Author name disambiguation using vector space model and hybrid similarity measures. In: Proceedings of international conference on contemporary computing. IEEE, pp 135–140
Liu W, Doğan RI, Kim S, Comeau DC, Kim W, Yeganova L, Lu Z, Wilbur WJ (2014) Author name disambiguation for pubmed. J Assoc Inf Sci Technol 65(4):765–781
Huang J, Ertekin S, Giles CL (2006) Efficient name disambiguation for large-scale databases. In: Proceedings of European conference on principles of data mining and knowledge discovery. Springer, pp 536–544
Wu J, Pan S, Zhu X, Zhang C, Wu X (2016) Positive and unlabeled multi-graph learning. IEEE Trans Cybern 47(4):818–829
Qiao Z, Du Y, Fu Y, Wang P, Zhou Y (2019) Unsupervised author disambiguation using heterogeneous graph convolutional network embedding. In: 2019 IEEE international conference on big data (Big Data), pp 910–919
Li Z, Sun Y, Zhu J, Tang S, Zhang C, Ma H (2020) Improve relation extraction with dual attention-guided graph convolutional networks. Neural Comput Appl. https://doi.org/10.1007/s00521-020-05087-z
Wu J, Pan S, Zhu X, Cai Z (2014b) Boosting for multi-graph classification. IEEE Trans Cybern 45(3):416–429
Wu J, Zhu X, Zhang C, Philip SY (2014) Bag constrained structure pattern mining for multi-graph classification. IEEE Trans Knowl Data Eng 26(10):2382–2396
Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: Proceedings of international conference on learning representations, pp 1–14
Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y (2018) Graph attention networks. In: Proceedings of international conference on learning representations, pp 1–12
Huang W, Qu Q, Yang M (2020) Interactive knowledge-enhanced attention network for answer selection. Neural Comput Appl. https://doi.org/10.1007/s00521-019-04630-x
Rozenshtein P, Bonchi F, Gionis A, Sozio M, Tatti N (2020) Finding events in temporal networks: segmentation meets densest subgraph discovery. Knowl Inf Syst 62:1611–1639
Chen Z, Chen F, Lai R, Zhang X, Lu C-T (2018) Rational neural networks for approximating jump discontinuities of graph convolution operator. In: Proceedings of IEEE international conference on data mining. IEEE, pp 406–415
Yang C, Feng Y, Li P, Shi Y, Han J (2018) Meta-graph based hin spectral embedding: methods, analyses, and insights. In: Proceedings of IEEE international conference on data mining. IEEE, pp 657–666
Hermansson L, Kerola T, Johansson F, Jethava V, Dubhashi D (2013) Entity disambiguation in anonymized graphs using graph kernels. In: Proceedings of ACM international conference on information and knowledge management. ACM, pp 1037–1046
Shin D, Kim T, Choi J, Kim J (2014) Author name disambiguation using a graph model with node splitting and merging based on bibliographic information. Scientometrics 100(1):15–50
Hussain I, Asghar S (2018) Author name disambiguation by exploiting graph structural clustering and hybrid similarity. Arab J Sci Eng 43(12):7421–7437
Si HJ, Tong W, Kausar S (2018) A conditional random field model for name disambiguation in national natural science foundation of china fund. J Algorithms Comput Technol 12(2):91–100
Saha TK, Zhang B, Al Hasan M (2015) Name disambiguation from link data in a collaboration graph using temporal and topological features. Soc Netw Anal Min 5(1):11
Shen W, Han J, Wang J (2014) A probabilistic model for linking named entities in web text with heterogeneous information networks. In: Proceedings of ACM SIGMOD international conference on management of data. ACM, pp 1199–1210
Wang X, Tang J, Cheng H, Philip SY (2011) Adana: active name disambiguation. In: Proceedings of international conference on data mining. IEEE, pp 794–803
Acknowledgements
This work was supported in part by National Natural Science Foundation of China under grant no. 61873288, in part by CERNET Innovation Project (NGII20190517); in part by Technology Projects, Hunan Key Laboratory for Internet of Things in Electricity (2019TP1016); in part by the Fundamental Research Funds for the Central Universities of Central South University (2020zzts594).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest regarding the contents of present article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, Y., Jiang, Z., Gao, J. et al. A supervised and distributed framework for cold-start author disambiguation in large-scale publications. Neural Comput & Applic 35, 13093–13108 (2023). https://doi.org/10.1007/s00521-020-05684-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-05684-y