skip to main content
10.1145/3383583.3398568acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
poster

Author Name Disambiguation in PubMed using Ensemble-Based Classification Algorithms

Published: 01 August 2020 Publication History

Abstract

Author name ambiguity is a common problem in digital libraries. The problem occurs because multiple individuals may share the same name and the same individual may be represented by various names. Researchers have proposed various techniques for author name disambiguation (AND). In this paper, we study AND in the context of research publications indexed in the PubMed citation database. We perform an empirical study where we experiment with two ensemble-based classification algorithms, namely, random forest and gradient boosted decision trees, on a publicly available corpus of manually disambiguated author names from PubMed. Results show that random forest produces higher accuracy, precision, recall and F1-score, but gradient boosted trees perform competitively. We also determine which features are most discriminative given the feature set and the classifiers.

References

[1]
Kunho Kim, Athar Sefid, Bruce A Weinberg, and C Lee Giles. 2018. A Web Service for Author Name Disambiguation in Scholarly Databases. In Proceedings of the IEEE International Conference on Web Services. IEEE, 265--273.
[2]
Debarshi Kumar Sanyal, Plaban Kumar Bhowmick, and Partha Pratim Das. 2019. A Review of Author Name Disambiguation Techniques for the Bibliographic Database. Journal of Information Science, SAGE (2019). https://doi.org/10.1007/s11192-019-03227--4
[3]
Min Song, Erin Hea-Jin Kim, and Ha Jin Kim. 2015. Exploring author name disambiguation on -scale. Journal of Informetrics, Vol. 9, 4 (2015), 924--941.
[4]
Pucktada Treeratpituk and C Lee Giles. 2009. Disambiguating authors in academic publications using random forests. In Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries. ACM, 39--48.
[5]
Dina Vishnyakova, Raul Rodriguez-Esteban, Khan Ozol, and Fabio Rinaldi. 2016.

Cited By

View all
  • (2025)Author name disambiguation based on heterogeneous graph neural networkPLOS ONE10.1371/journal.pone.031099220:2(e0310992)Online publication date: 26-Feb-2025
  • (2024)Towards Effective Author Name Disambiguation by Hybrid AttentionJournal of Computer Science and Technology10.1007/s11390-023-2070-z39:4(929-950)Online publication date: 1-Jul-2024
  • (2024)Author name disambiguation literature review with consolidated meta-analytic approachInternational Journal on Digital Libraries10.1007/s00799-024-00398-125:4(765-785)Online publication date: 10-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
JCDL '20: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020
August 2020
611 pages
ISBN:9781450375856
DOI:10.1145/3383583
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 2020

Check for updates

Author Tags

  1. author name disambiguation
  2. ensemble learning
  3. gradient boosted tree
  4. machine learning
  5. pubmed
  6. random forest

Qualifiers

  • Poster

Funding Sources

  • Ministry of Human Resource Development Government of India

Conference

JCDL '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 415 of 1,482 submissions, 28%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Author name disambiguation based on heterogeneous graph neural networkPLOS ONE10.1371/journal.pone.031099220:2(e0310992)Online publication date: 26-Feb-2025
  • (2024)Towards Effective Author Name Disambiguation by Hybrid AttentionJournal of Computer Science and Technology10.1007/s11390-023-2070-z39:4(929-950)Online publication date: 1-Jul-2024
  • (2024)Author name disambiguation literature review with consolidated meta-analytic approachInternational Journal on Digital Libraries10.1007/s00799-024-00398-125:4(765-785)Online publication date: 10-Apr-2024
  • (2022)Toward a New Paradigm for Author Name DisambiguationIEEE Access10.1109/ACCESS.2022.319008810(76055-76068)Online publication date: 2022
  • (2021)S2AND: A Benchmark and Evaluation System for Author Name Disambiguation2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)10.1109/JCDL52503.2021.00029(170-179)Online publication date: Sep-2021
  • (2021)Multiple Features Driven Author Name Disambiguation2021 IEEE International Conference on Web Services (ICWS)10.1109/ICWS53863.2021.00071(506-515)Online publication date: Sep-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media