Exploiting similarities across multiple dimensions for author name disambiguation

Pooja, KM.; Mondal, Samrat; Chandra, Joydeep

doi:10.1007/s11192-021-04101-y

Exploiting similarities across multiple dimensions for author name disambiguation

Published: 18 July 2021

Volume 126, pages 7525–7560, (2021)
Cite this article

Scientometrics Aims and scope Submit manuscript

564 Accesses
7 Citations
Explore all metrics

Abstract

In bibliometric analysis, ambiguity in author names may lead to erroneous aggregation of records. The author name disambiguation techniques attempt to address this issue by attributing records to the corresponding author. The name disambiguation has been widely studied as a clustering task. However, maintaining consistent accuracy levels over datasets is still a major challenge. Recent efforts have witnessed the use of representation learning based techniques to map the records to an embedding space that can be used to determine the clusters. However, some of these models that use supervised global embedding fail to generalize across different datasets, while others lag in the accuracy. In this paper, we propose a method that uses two independent relations among the documents-co-authorship and meta-content of document, to generate a latent representation of documents that is capable of generalizing over various datasets (consisting different sets of features). Through rigorous validation, we discover that the proposed approach outperforms several state-of-the-art methods by a significant margin in terms of standard measures like pairwise F1, K metric, and BF1 scores. Moreover, we have also validated the performance of our method with the statistical test.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multilayer heuristics based clustering framework (MHCF) for author name disambiguation

Article 19 July 2021

Author Name Disambiguation by Exploiting Graph Structural Clustering and Hybrid Similarity

Article 16 February 2018

Learning semantic and relationship joint embedding for author name disambiguation

Article 20 June 2020

Notes

https://github.com/yaya213/DBLP-Name-Disambiguation-Dataset.
http://clgiles.ist.psu.edu/data/.
Experimental results of state-of-the-art methods are presented by running code released on the experimental dataset.

References

Ackermann, M. R., & Reitz, F. (2018). Homonym detection in curated bibliographies: Learning from dblp’s experience. In International conference on theory and practice of digital libraries (pp. 59–65). Springer.
Amancio, D. R., Oliveira, O. N., Jr., & da Fontoura Costa, L. (2012). Three-feature model to reproduce the topology of citation networks and the effects from authors visibility on their h-index. Journal of Informetrics, 6(3), 427–434.
Article Google Scholar
Amancio, D. R., Oliveira, O. N., Jr., & da Fontoura Costa, L. (2015). Topological-collaborative approach for disambiguating authors names in collaborative networks. Scientometrics, 102(1), 465–485.
Article Google Scholar
Bekkerman, R., & McCallum, A. (2005). Disambiguating web appearances of people in a social network. In Proceedings of the 14th international conference on World Wide Web (pp. 463–470). ACM.
Cen, L., Dragut, E. C., Si, L., & Ouzzani, M. (2013). Author disambiguation by hierarchical agglomerative clustering with adaptive stopping criterion. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval (pp. 741–744). ACM.
Chen, B., Zhang, J., Tang, J., Cai, L., Wang, Z., Zhao, S., Chen, H., & Li, C. (2019). Conna: Addressing name disambiguation on the fly. arXiv:191012202
Cota, R. G., Ferreira, A. A., Nascimento, C., Gonçalves, M. A., & Laender, A. H. (2010). An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. Journal of the Association for Information Science and Technology, 61(9), 1853–1870.
Google Scholar
Fan, X., Wang, J., Pu, X., Zhou, L., & Lv, B. (2011). On graph-based name disambiguation. Journal of Data and Information Quality (JDIQ), 2(2), 10.
Google Scholar
Ferreira, A. A., Veloso, A., Gonçalves, M. A., & Laender, A. H. (2014). Self-training author name disambiguation for information scarce scenarios. Journal of the Association for Information Science and Technology, 65(6), 1257–1278.
Article Google Scholar
Francq, P. (Ed.). (2011). A semi-supervised algorithm to manage communities of interests. In Collaborative search and communities of interest: Trends in knowledge sharing and assessment (pp. 98–133). IGI Global.
Gao, H., Wang, Z., & Ji, S. (2018). Large-scale learnable graph convolutional networks. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 1416–1424). ACM.
Giles, C. L., Zha, H., & Han, H. (2005). Name disambiguation in author citations using a k-way spectral clustering method. In Proceedings of the 5th ACM/IEEE-CS joint conference on Digital Libraries, 2005. JCDL’05 (pp. 334–343). IEEE.
Halkidi, M., Vazirgiannis, M., & Batistakis, Y. (2000). Quality scheme assessment in the clustering process. In European conference on principles of data mining and knowledge discovery (pp. 265–276). Springer.
Han, H., Giles, L., Zha, H., Li, C., & Tsioutsiouliklis, K. (2004). Two supervised learning approaches for name disambiguation in author citations. In: Proceedings of the 2004 joint ACM/IEEE conference on Digital Libraries, 2004 (pp. 296–305). IEEE.
Hussain, I., & Asghar, S. (2018). Disc: Disambiguating homonyms using graph structural clustering. Journal of Information Science, 44(6), 830–847.
Article Google Scholar
Jaccard, P. (1901). Distribution de la flore alpine dans le bassin des dranses et dans quelques régions voisines. Bull Soc Vaudoise Sci Nat, 37, 241–272.
Google Scholar
Khabsa, M., Treeratpituk, P., & Giles, C. L. (2015). Online person name disambiguation with constraints. In Proceedings of the 15th ACM/IEEE-CS joint conference on Digital Libraries (pp. 37–46). ACM.
Kim, J. (2019). A fast and integrative algorithm for clustering performance evaluation in author name disambiguation. Scientometrics, 120(2), 661–681.
Article Google Scholar
Kim, J., Kim, J., & Owen-Smith, J. (2019). Generating automatically labeled data for author name disambiguation: An iterative clustering method. Scientometrics, 118(1), 253–280.
Article Google Scholar
Kipf, T. N., & Welling, M. (2016). Variational graph auto-encoders. arXiv:161107308
Lapidot, I. (2002). Self-organizing-maps with bic for speaker clustering. IDIAP Technical report.
Lee, J. B., Rossi, R. A., Kong, X., Kim, S., Koh, E., & Rao, A. (2019). Graph convolutional networks with motif-based attention. In Proceedings of the 28th ACM international conference on information and knowledge management (pp. 499–508).
Li, S., Cong, G., & Miao, C. (2012). Author name disambiguation using a new categorical distribution similarity. In Machine learning and knowledge discovery in databases (pp. 569–584).
Louppe, G., Al-Natsheh, H. T., Susik, M., & Maguire, E. J. (2016). Ethnicity sensitive author disambiguation using semi-supervised learning. In International conference on knowledge engineering and the semantic web (pp. 272–287). Springer.
Müller, M. C. (2017). Semantic author name disambiguation with word embeddings. In International conference on theory and practice of Digital Libraries (pp. 300–311). Springer.
Müller, M. C., Reitz, F., & Roy, N. (2017). Data sets for author name disambiguation: an empirical analysis and a new resource. Scientometrics, 111(3), 1467–1500.
Article Google Scholar
Oliveira, J. W. (2005). A strategy for removing ambiguity in the identification of the authorship of digital objects. Master’s thesis Universidade Federal de Minas Gerais, Brazil in Portuguese.
Pelleg, D., & Moore, A. W. (2000). X-means: Extending k-means with efficient estimation of the number of clusters. In Proceedings of the seventeenth international conference on machine learning, ICML ’00 (pp. 727–734). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. http://dl.acm.org/citation.cfm?id=645529.657808
Peng, H. T., Lu, C. Y., Hsu, W., & Ho, J. M. (2012). Disambiguating authors in citations on the web and authorship correlations. Expert Systems with Applications, 39(12), 10521–10532.
Article Google Scholar
Pooja, K., Mondal, S., & Chandra, J. (2019). A graph combination with edge pruning-based approach for author name disambiguation. Journal of the Association for Information Science and Technology, 71, 69–83.
Google Scholar
Santana, A. F., Gonçalves, M. A., Laender, A. H., & Ferreira, A. A. (2015). On the combination of domain-specific heuristics for author name disambiguation: the nearest cluster method. International Journal on Digital Libraries, 16(3–4), 229–246.
Article Google Scholar
Schulz, C., Mazloumian, A., Petersen, A. M., Penner, O., & Helbing, D. (2014). Exploiting citation networks for large-scale author name disambiguation. EPJ Data Science, 3(1), 11.
Article Google Scholar
Shin, D., Kim, T., Choi, J., & Kim, J. (2014). Author name disambiguation using a graph model with node splitting and merging based on bibliographic information. Scientometrics, 100(1), 15–50.
Article Google Scholar
Sinha, A., Shen, Z., Song, Y., Ma, H., Eide, D., Hsu, BjP., & Wang, K. (2015). An overview of microsoft academic service (mas) and applications. In Proceedings of the 24th international conference on world wide web (pp. 243–246). ACM.
Spielman DA (2007) Spectral graph theory and its applications. In: 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07), pp 29–38
Tang, J., Fong, A. C., Wang, B., & Zhang, J. (2012). A unified probabilistic framework for name disambiguation in digital library. IEEE Transactions on Knowledge and Data Engineering, 24(6), 975–987.
Article Google Scholar
Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., & Su, Z. (2008). Arnetminer: Extraction and mining of academic social networks. In KDD’08 (pp. 990–998).
Thorpe, S. G., Thibeault, C. M., Canac, N., Jalaleddini, K., Dorn, A., Wilk, S. J., et al. (2020). Toward automated classification of pathological transcranial doppler waveform morphology via spectral clustering. PLoS ONE, 15(2), e0228642.
Article Google Scholar
Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2), 411–423.
Article MathSciNet Google Scholar
Tran, H. N., Huynh, T., & Do, T. (2014). Author name disambiguation by using deep neural network. In Asian conference on intelligent information and database systems (pp. 123–132). Springer.
Van Rijsbergen, C. (1979). Information retrieval (Vol. 14). Dept. of Computer Science, University of Glasgow. https://citeseer.ist.psu.edu/https://vanrijsbergen79information.html
Veloso, A., Ferreira, A. A., Gonçalves, M. A., Laender, A. H., & Meira, W., Jr. (2012). Cost-effective on-demand associative author name disambiguation. Information Processing & Management, 48(4), 680–697.
Article Google Scholar
Viana, M. P., Amancio, D. R., & Costa, Ld. F. (2013). On time-varying collaboration networks. Journal of Informetrics, 7(2), 371–378.
Article Google Scholar
Wang, D., Cui, P., & Zhu, W. (2016). Structural deep network embedding. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 1225–1234). ACM.
Wang, J., Berzins, K., Hicks, D., Melkers, J., Xiao, F., & Pinheiro, D. (2012). A boosted-trees method for name disambiguation. Scientometrics, 93(2), 391–411.
Article Google Scholar
Wang, X., & Sukthankar, G. (2014). Link prediction in heterogeneous collaboration networks. In R. Missaoui, & I. Sarr (Eds.), Social network analysis-community detection and evolution (pp. 165–192). Springer.
Wang, X., Tang, J., Cheng, H., & Philip, S. Y. (2011). Adana: Active name disambiguation. In 2011 IEEE 11th international conference on data mining (ICDM) (pp 794–803). IEEE.
Wu, H., Li, B., Pei, Y., & He, J. (2014). Unsupervised author disambiguation using Dempster–Shafer theory. Scientometrics, 101(3), 1955–1972.
Article Google Scholar
Xiong, B., Bao, P., & Wu, Y. (2020). Learning semantic and relationship joint embedding for author name disambiguation. Neural Computing and Applications, 33, 1987–1998.
Article Google Scholar
Xu, J., Shen, S., Li, D., & Fu, Y. (2018). A network-embedding based method for author disambiguation. In Proceedings of the 27th ACM international conference on information and knowledge management (pp. 1735–1738). ACM.
Yan, H., Peng, H., Li, C., Li, J., & Wang, L. (2020). Bibliographic name disambiguation with graph convolutional network. In International conference on web information systems engineering (pp. 538–551). Springer.
Zhang, B., & Al Hasan, M. (2017). Name disambiguation in anonymized graphs using network embedding. In Proceedings of the 2017 ACM on conference on information and knowledge management (pp. 1239–1248). ACM.
Zhang, B., Dundar, M., & Al Hasan, M. (2016). Bayesian non-exhaustive classification a case study: Online name disambiguation using temporal record streams. In Proceedings of the 25th ACM international on conference on information and knowledge management (pp. 1341–1350). ACM.
Zhang, W., Yan, Z., & Zheng, Y. (2019). Author name disambiguation using graph node embedding method. In 2019 IEEE 23rd international conference on computer supported cooperative work in design (CSCWD) (pp. 410–415). IEEE.
Zhang, Y., Zhang, F., Yao, P., & Tang, J. (2018). Name disambiguation in aminer: Clustering, maintenance, and human in the loop. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 1002–1011). ACM.
Zheng-Jun, Z., & Yao-Qin, Z. (2009). Estimating the image segmentation number via the entropy gap statistic. In 2009 Second international conference on information and computing science (Vol. 2, pp. 14–16). IEEE.

Download references

Acknowledgement

This work was supported by the Visvesvaraya Ph.D. Scheme, Ministry of Electronics and Information Technology, Government of India under Award MEITY-PHD-2517.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology Patna, Patna, 801103, India
KM. Pooja, Samrat Mondal & Joydeep Chandra

Authors

KM. Pooja
View author publications
You can also search for this author in PubMed Google Scholar
Samrat Mondal
View author publications
You can also search for this author in PubMed Google Scholar
Joydeep Chandra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to KM. Pooja.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pooja, K., Mondal, S. & Chandra, J. Exploiting similarities across multiple dimensions for author name disambiguation. Scientometrics 126, 7525–7560 (2021). https://doi.org/10.1007/s11192-021-04101-y

Download citation

Received: 23 September 2020
Accepted: 05 July 2021
Published: 18 July 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s11192-021-04101-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploiting similarities across multiple dimensions for author name disambiguation

Abstract

Access this article

Similar content being viewed by others

Multilayer heuristics based clustering framework (MHCF) for author name disambiguation

Author Name Disambiguation by Exploiting Graph Structural Clustering and Hybrid Similarity

Learning semantic and relationship joint embedding for author name disambiguation

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Exploiting similarities across multiple dimensions for author name disambiguation

Abstract

Access this article

Similar content being viewed by others

Multilayer heuristics based clustering framework (MHCF) for author name disambiguation

Author Name Disambiguation by Exploiting Graph Structural Clustering and Hybrid Similarity

Learning semantic and relationship joint embedding for author name disambiguation

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation