A supervised and distributed framework for cold-start author disambiguation in large-scale publications

Chen, Yibo; Jiang, Zhiyi; Gao, Jianliang; Du, Hongliang; Gao, Liping; Li, Zhao

doi:10.1007/s00521-020-05684-y

A supervised and distributed framework for cold-start author disambiguation in large-scale publications

S.I. : Deep Social Computing
Published: 05 March 2021

Volume 35, pages 13093–13108, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Yibo Chen³,
Zhiyi Jiang⁴,
Jianliang Gao⁴,
Hongliang Du⁴,
Liping Gao¹ &
…
Zhao Li²

532 Accesses
Explore all metrics

Abstract

Names make up a large portion of queries in search engines, while the name ambiguity problem brings negative effect to the service quality of search engines. In digital academic systems, this problem refers to a large number of publications containing ambiguous author names. Name ambiguity derives from many people sharing identical names, or names may be abbreviated. Although some methods have been proposed in the decade, this problem is still not completely solved and there are many subproblems needing to be studied. Due to lack of information, it is a nontrivial task to distinguish ambiguous authors accurately relying on limited internal information only. In this paper, we focus on the cold-start disambiguation task with homonymous author names, i.e., distinguishing publications written by authors with identical names. We present a supervised framework named DND (abbreviation for Distributed Framework for Name Disambiguation) to solve the author disambiguation problem efficiently. DND utilizes accessible information and trains a robust function to measure similarities between publications, and then determines whether they belong to the same author. In traditional clustering-based approaches for author disambiguation, the number of clusters which is the amount of authors sharing the same name is hard to predict in advance, while DND transforms the clustering task to a linkage prediction task to avoid specifying the number of clusters. We validate the effectiveness of DND on two real-world datasets. The experimental results indicate that DND achieves a competitive performance compared with the baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Author Name Disambiguation by Exploiting Graph Structural Clustering and Hybrid Similarity

Article 16 February 2018

Dynamic author name disambiguation for growing digital libraries

Article 21 July 2015

Author Name Disambiguation Based on Rule and Graph Model

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

http://howmanyofme.com, accessed July 1, 2020.
https://dblp.uni-trier.de.
https://academic.microsoft.com/.
https://www.aminer.cn.
http://spark.apache.org/.
https://github.com/xxx/xxx.
https://www.biendata.com/competition/aminer2019/.

References

pal Singh V, Kumar P (2020) Word sense disambiguation for Punjabi language using deep learning techniques. Neural Comput Appl 32:2963–2973
Article Google Scholar
Jirak D, Biertimpel D, Kerzel M, Wermter S (2020) Solving visual object ambiguities when pointing: an unsupervised learning approach. Neural Comput Appl. https://doi.org/10.1007/s00521-020-05109-w
Article Google Scholar
Wang Z, Zhang J, Feng J, Chen Z (2014) Knowledge graph embedding by translating on hyperplanes. In: Proceedings of AAAI conference on artificial intelligence, pp 1112–1119
Gao J, Tian L, Lv T, Wang J, Song B, Hu X (2019) Protein2vec: aligning multiple ppi networks with representation learning. IEEE/ACM Trans Comput Biol Bioinform 19(3):571–578
Google Scholar
Zhang J, Philip SY (2015) Multiple anonymized social networks alignment. In: Proceedings of IEEE international conference on data mining. IEEE, pp 599–608
Zhang Y, Zhang F, Yao P, Tang J (2018) Name disambiguation in aminer: clustering, maintenance, and human in the loop. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1002–1011
Zhang B, Al Hasan M (2017) Name disambiguation in anonymized graphs using network embedding. In: Proceedings of ACM international conference on information and knowledge management. ACM, pp 1239–1248
Fan X, Wang J, Pu X, Zhou L, Lv B (2011) On graph-based name disambiguation. J Data Inf Qual (JDIQ) 2(2):10
Google Scholar
Shen J, Xiao J, He X, Shang J, Sinha S, Han J (2018) Entity set search of scientific literature: an unsupervised ranking approach. In: Proceedings of ACM SIGIR conference on research and development in information retrieval. ACM, pp 565–574
Zwicklbauer S, Seifert C, Granitzer M (2016) Robust and collective entity disambiguation through semantic embeddings. In: Proceedings of ACM SIGIR conference on research and development in information retrieval. ACM, pp 425–434
Huang S, Yang B, Yan S, Rousseau R (2014) Institution name disambiguation for research assessment. Scientometrics 99(3):823–838
Article Google Scholar
Kim J, Kim J, Owen-Smith J (2019) Generating automatically labeled data for author name disambiguation: an iterative clustering method. Scientometrics 118(1):253–280
Article Google Scholar
Schulz J (2016) Using Monte Carlo simulations to assess the impact of author name disambiguation quality on different bibliometric analyses. Scientometrics 107(3):1283–1298
Article Google Scholar
Yin D, Motohashi K, Dang J (2020) Large-scale name disambiguation of Chinese patent inventors (1985–2016). Scientometrics 122(2):765–790
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton G E (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of international conference on neural information processing systems. Curran Associates Inc., pp 1097–1105
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv1810.04805, pp 1–14
Singh M, Kumar R, Chana I (2020) Improving neural machine translation for low-resource Indian languages using rule-based feature extraction. Neural Comput Appl. https://doi.org/10.1007/s00521-020-04990-9
Article Google Scholar
Teles G, Rodrigues JJPC, Saleem K, Kozlov S, Rabêlo RAL (2020) Machine learning and decision support system on credit scoring. Neural Comput Appl 32:9809–9826
Article Google Scholar
Hou R, Kong Y, Cai B, Liu H (2020) Unstructured big data analysis algorithm and simulation of internet of things based on machine learning. Neural Comput Appl 32:5399–5407
Article Google Scholar
Zhang Y, Wu J, Zhou C, Cai Z (2017) Instance cloned extreme learning machine. Pattern Recognit 68:52–65
Article Google Scholar
Gurney T, Horlings E, Van Den Besselaar P (2012) Author disambiguation using multi-aspect similarity indicators. Scientometrics 91(2):435–449
Article Google Scholar
Müller M-C (2018) On the contribution of word-level semantics to practical author name disambiguation. In: Proceedings of ACM/IEEE joint conference on digital libraries, pp 367–368
Yin D, Motohashi K (2018) Inventor name disambiguation with gradient boosting decision tree and inventor mobility in China (1985–2016). Technical report, Research Institute of Economy, Trade and Industry
Ju Y, Adams B, Janowicz K, Hu Y, Yan B, McKenzie G (2016)Things and strings: improving place name disambiguation from short texts by combining entity co-occurrence with topic modeling. In: Proceedings of European knowledge acquisition workshop. Springer, pp 353–367
Steorts RC, Ventura SL, Sadinle M, Fienberg SE (2014) A comparison of blocking methods for record linkage. In: Proceedings of international conference on privacy in statistical databases. Springer, pp 253–268
Yoshida M, Ikeda M, Ono S, Sato I, Nakagawa H (2010) Person name disambiguation by bootstrapping. In: Proceedings of ACM SIGIR international conference on research and development in information retrieval. ACM, pp 10–17
Zhang K, Zhu Y, Gao W, Xing Y, Zhou J (2018) An approach for named entity disambiguation with knowledge graph. In: Proceedings of international conference on audio, language and image processing. IEEE, pp 138–143
Qian Y, Hu Y, Cui J, Zheng Q, Nie Z (2011) Combining machine learning and human judgment in author disambiguation. In: Proceedings of ACM international conference on information and knowledge management. ACM, pp 1241–1246
Shen Q, Wu T, Yang H, Wu Y, Qu H, Cui W (2016) Nameclarifier: a visual analytics system for author name disambiguation. IEEE Trans Vis Comput Graph 23(1):141–150
Article Google Scholar
Louppe G, Al-Natsheh HT, Susik M, Maguire EJ (2016) Ethnicity sensitive author disambiguation using semi-supervised learning. In: Proceedings of international conference on knowledge engineering and the semantic web. Springer, pp 272–287
Zhang B, Dundar M, Al Hasan M (2016) Bayesian non-exhaustive classification a case study: Online name disambiguation using temporal record streams. In: Proceedings of ACM international on conference on information and knowledge management. ACM, pp 1341–1350
Treeratpituk P, Giles CL (2009) Disambiguating authors in academic publications using random forests. In: Proceedings of ACM/IEEE joint conference on digital libraries. ACM, pp 39–48
Han H, Giles L, Zha H, Li C, Tsioutsiouliklis K (2004) Two supervised learning approaches for name disambiguation in author citations. In: Proceedings of ACM/IEEE joint conference on digital libraries. IEEE, pp 296–305
Pooja KM, Mondal S, Chandra J (2018) An unsupervised heuristic based approach for author name disambiguation. In: Proceedings of international conference on communication systems and networks. IEEE, pp 540–542
Kim J (2018) Evaluating author name disambiguation for digital libraries: a case of DBLP. Scientometrics 116(3):1867–1886
Article Google Scholar
Zhu J, Wu X, Xueqin Lin, Huang C, Fung GPC, Tang Y (2018) A novel multiple layers name disambiguation framework for digital libraries using dynamic clustering. Scientometrics 114(3):781–794
Article Google Scholar
Xiong B, Bao P, Wu Y (2020) Learning semantic and relationship joint embedding for author name disambiguation. Neural Comput Appl. https://doi.org/10.1007/s00521-020-05088-y
Article Google Scholar
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of international conference on neural information processing systems. Curran Associates Inc., pp 3111–3119
Zhu J, Yang Y, Xie Q, Wang L, Hassan S-U (2014) Robust hybrid name disambiguation framework for large databases. Scientometrics 98(3):2255–2274
Article Google Scholar
Han H, Yao C, Fu Y, Yu Y, Zhang Y, Xu S (2017) Semantic fingerprints-based author name disambiguation in chinese documents. Scientometrics 111(3):1879–1896
Article Google Scholar
Tang J, Fong ACM, Wang B, Zhang J (2011) A unified probabilistic framework for name disambiguation in digital library. IEEE Trans Knowl Data Eng 24(6):975–987
Article Google Scholar
Pelleg D, Moore AW et al (2000) X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of international conference on machine learning, vol 1, pp 727–734
Wu H, Li B, Pei Y, He J (2014a) Unsupervised author disambiguation using Dempster–Shafer theory. Scientometrics 101(3):1955–1972
Article Google Scholar
Arif T, Ali R, Asger M (2014) Author name disambiguation using vector space model and hybrid similarity measures. In: Proceedings of international conference on contemporary computing. IEEE, pp 135–140
Liu W, Doğan RI, Kim S, Comeau DC, Kim W, Yeganova L, Lu Z, Wilbur WJ (2014) Author name disambiguation for pubmed. J Assoc Inf Sci Technol 65(4):765–781
Article Google Scholar
Huang J, Ertekin S, Giles CL (2006) Efficient name disambiguation for large-scale databases. In: Proceedings of European conference on principles of data mining and knowledge discovery. Springer, pp 536–544
Wu J, Pan S, Zhu X, Zhang C, Wu X (2016) Positive and unlabeled multi-graph learning. IEEE Trans Cybern 47(4):818–829
Article Google Scholar
Qiao Z, Du Y, Fu Y, Wang P, Zhou Y (2019) Unsupervised author disambiguation using heterogeneous graph convolutional network embedding. In: 2019 IEEE international conference on big data (Big Data), pp 910–919
Li Z, Sun Y, Zhu J, Tang S, Zhang C, Ma H (2020) Improve relation extraction with dual attention-guided graph convolutional networks. Neural Comput Appl. https://doi.org/10.1007/s00521-020-05087-z
Article Google Scholar
Wu J, Pan S, Zhu X, Cai Z (2014b) Boosting for multi-graph classification. IEEE Trans Cybern 45(3):416–429
Google Scholar
Wu J, Zhu X, Zhang C, Philip SY (2014) Bag constrained structure pattern mining for multi-graph classification. IEEE Trans Knowl Data Eng 26(10):2382–2396
Article Google Scholar
Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: Proceedings of international conference on learning representations, pp 1–14
Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y (2018) Graph attention networks. In: Proceedings of international conference on learning representations, pp 1–12
Huang W, Qu Q, Yang M (2020) Interactive knowledge-enhanced attention network for answer selection. Neural Comput Appl. https://doi.org/10.1007/s00521-019-04630-x
Article Google Scholar
Rozenshtein P, Bonchi F, Gionis A, Sozio M, Tatti N (2020) Finding events in temporal networks: segmentation meets densest subgraph discovery. Knowl Inf Syst 62:1611–1639
Article Google Scholar
Chen Z, Chen F, Lai R, Zhang X, Lu C-T (2018) Rational neural networks for approximating jump discontinuities of graph convolution operator. In: Proceedings of IEEE international conference on data mining. IEEE, pp 406–415
Yang C, Feng Y, Li P, Shi Y, Han J (2018) Meta-graph based hin spectral embedding: methods, analyses, and insights. In: Proceedings of IEEE international conference on data mining. IEEE, pp 657–666
Hermansson L, Kerola T, Johansson F, Jethava V, Dubhashi D (2013) Entity disambiguation in anonymized graphs using graph kernels. In: Proceedings of ACM international conference on information and knowledge management. ACM, pp 1037–1046
Shin D, Kim T, Choi J, Kim J (2014) Author name disambiguation using a graph model with node splitting and merging based on bibliographic information. Scientometrics 100(1):15–50
Article Google Scholar
Hussain I, Asghar S (2018) Author name disambiguation by exploiting graph structural clustering and hybrid similarity. Arab J Sci Eng 43(12):7421–7437
Article Google Scholar
Si HJ, Tong W, Kausar S (2018) A conditional random field model for name disambiguation in national natural science foundation of china fund. J Algorithms Comput Technol 12(2):91–100
Article Google Scholar
Saha TK, Zhang B, Al Hasan M (2015) Name disambiguation from link data in a collaboration graph using temporal and topological features. Soc Netw Anal Min 5(1):11
Article Google Scholar
Shen W, Han J, Wang J (2014) A probabilistic model for linking named entities in web text with heterogeneous information networks. In: Proceedings of ACM SIGMOD international conference on management of data. ACM, pp 1199–1210
Wang X, Tang J, Cheng H, Philip SY (2011) Adana: active name disambiguation. In: Proceedings of international conference on data mining. IEEE, pp 794–803

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China under grant no. 61873288, in part by CERNET Innovation Project (NGII20190517); in part by Technology Projects, Hunan Key Laboratory for Internet of Things in Electricity (2019TP1016); in part by the Fundamental Research Funds for the Central Universities of Central South University (2020zzts594).

Author information

Authors and Affiliations

Huaihua University, Huaihua, China
Liping Gao
Alibaba Group, Hangzhou, China
Zhao Li
Hunan Key Laboratory for Internet of Things in Electricity, Changsha, China
Yibo Chen
Central South University, Changsha, China
Zhiyi Jiang, Jianliang Gao & Hongliang Du

Authors

Yibo Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyi Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Jianliang Gao
View author publications
You can also search for this author in PubMed Google Scholar
Hongliang Du
View author publications
You can also search for this author in PubMed Google Scholar
Liping Gao
View author publications
You can also search for this author in PubMed Google Scholar
Zhao Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Liping Gao or Zhao Li.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest regarding the contents of present article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Y., Jiang, Z., Gao, J. et al. A supervised and distributed framework for cold-start author disambiguation in large-scale publications. Neural Comput & Applic 35, 13093–13108 (2023). https://doi.org/10.1007/s00521-020-05684-y

Download citation

Received: 16 July 2020
Accepted: 28 December 2020
Published: 05 March 2021
Issue Date: June 2023
DOI: https://doi.org/10.1007/s00521-020-05684-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A supervised and distributed framework for cold-start author disambiguation in large-scale publications

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Author Name Disambiguation by Exploiting Graph Structural Clustering and Hybrid Similarity

Dynamic author name disambiguation for growing digital libraries

Author Name Disambiguation Based on Rule and Graph Model

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A supervised and distributed framework for cold-start author disambiguation in large-scale publications

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Author Name Disambiguation by Exploiting Graph Structural Clustering and Hybrid Similarity

Dynamic author name disambiguation for growing digital libraries

Author Name Disambiguation Based on Rule and Graph Model

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation