Skip to main content

Framework for Author Name Disambiguation in Scientific Papers Using an Ontological Approach and Deep Learning

  • Conference paper
  • First Online:
Knowledge Graphs and Semantic Web (KGSWC 2022)

Abstract

The aim of this paper is to solve the problem of disambiguation of authors’ names in scientific papers. In particular, it focuses on the problem of synonyms and homonyms. Thus, we often find two or more names written in different forms denoting the same person. Moreover, there may be several authors using the same name. To address both the synonym and homonym problems in scientific papers, we propose a framework that uses a hybrid approach of an ontological model and a deep learning model. First, we describe the design of the ontology model, the automatic ontology creation process, and the construction of a weighted co-author network through a set of semantic rules and queries. Second, the selected features are preprocessed during the attribute engineering process to measure the similarity indicator for each feature. Third, the similarity indicators are reduced to a vector space model and used as input to the Deep Learning-based author name disambiguation method to model different types of features. Fourth, the proposed framework is tested on smaller groups of the gold standard large dataset of scientific papers from several international databases named LAGOS-AND and achieves promising results compared to other similar solutions proposed in the literature.

This work is partially supported by Project 3 “ICT supporting the educational processes and the knowledge management in higher education (ELINF)” of the NETWORK University Cooperation “Strengthening of the role of ICT in Cuban Universities for the development of the society”. We thank Carlos Alberto Morell for his useful suggestions and ideas and the team of Li Zhang, Wei Lu and Jinqing Yang for providing the corpus used to train the Doc2Vec model of the gold standard dataset LAGOS-AND.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://xmlns.com/foaf/0.1/.

  2. 2.

    http://www.w3c.org/2004/02/skos/.

  3. 3.

    http://www.w3c.org/2003/01/geo/.

  4. 4.

    https://purl.org/ontology/bibo/.

  5. 5.

    https://bioportal.bioontology.org/ontologies/VIVO.

  6. 6.

    https://d-nb.info/standards/elementset/gnd.

References

  1. Shoaib, M., Daud, A., Amjad, T.: Author name disambiguation in bibliographic databases: a survey. arXiv prepre arXiv:2004.06391, pp. 1–24 (2020)

  2. Wang, P., Zhao, J., Huang, K., Xu, B.: A unified semi-supervised framework for author disambiguation in academic social network. In: Decker, H., Lhotská, L., Link, S., Spies, M., Wagner, R.R. (eds.) Conference 2014, LNCS, vol. 8645, pp. 1–16. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10085-2_1

  3. Hussain, I., Asghar, S.: A survey of author name disambiguation techniques: 2010–2016. Knowl. Eng. Rev. 32, 1–24 (2017). https://doi.org/10.1017/S0269888917000182

    Article  Google Scholar 

  4. Ferreira, A.A., Gon¸calves, M.A., Laender, A.H.F.: Automatic disambiguation of author names in bibliographic repositories. In: Synthesis Lectures on Information Concepts, Retrieval, and Services, vol. 12 (1), pp. 1—146. Morgan & Claypool Publishers (2020). https://doi.org/10.2200/S01011ED1V01Y202005ICR070

  5. Zhang, L., Lu, W., Yang, J.: LAGOS-AND: a large, gold standard dataset for scholarly author name disambiguation. arXiv prepre arXiv:2104.01821, pp. 1—27 (2021)

  6. Fiannaca, A., La Rosa, M., Gaglio, S., Rizzo, R., Urso, A.: An ontological-based knowledge organization for bioinformatics workflow management system. EMBnet. J. 18(B), 110-–112 (2012). https://doi.org/10.14806/ej.18.B.570

  7. Kurki, J., Hyvönen, E.: Authority control of people and organizations on the semantic web. In: Proceedings of the International Conferences on Digital Libraries and the Semantic Web 2009 (ICSD2009), September 2009, Trento, Italy, p. 15 (2009)

    Google Scholar 

  8. Pattuelli, M. C.: From uniform identifiers to graphs, from individuals to communities: what we talk about when we talk about linked person data. In: Challenges and Opportunities for Knowledge Organization in the Digital Age, pp. 571–580. Ergon-Verlag (2018). https://doi.org/10.5771/9783956504211-571

  9. Kim, J.: Scale free collaboration networks: an author name disambiguation perspective. J. Assoc. Inf. Sci. Technol. 70(7), 685–700 (2019). https://doi.org/10.1002/asi.24158

    Article  Google Scholar 

  10. Thenmozhi, D., Aravindan, C.: Ontology-based Tamil-English cross-lingual information retrieval system. Sadhana 43(157), 1–14 (2018). https://doi.org/10.1007/s12046-018-0942-7

    Article  Google Scholar 

  11. Zaman, G., et al.: An ontological framework for information extraction from diverse scientific sources. IEEE Access 9, 42111–42124 (2021). https://doi.org/10.1109/ACCESS.2021.3063181

    Article  MathSciNet  Google Scholar 

  12. Hassell, J., Aleman-Meza, B., Arpinar, I.B.: Ontology-Driven Automatic Entity Disambiguation in Unstructured Text. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 44–57. Springer, Heidelberg (2006). https://doi.org/10.1007/11926078_4

    Chapter  Google Scholar 

  13. Park, Y.-T., Kim, J.-M.: OnCU system: ontology-based category utility approach for author name disambiguation. In: 2nd International Conference on Ubiquitous Information Management and Communication Proceedings, pp. 63–68. New York, USA (2008). https://doi.org/10.1145/1352793.1352807

  14. Lu, Z., Yan, Z., He, L.: OnPerDis: ontology-based personal name disambiguation on the web. In: 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) Proceedings, vol. 1, pp. 185–192. IEEE (2013).https://doi.org/10.1109/WI-IAT.2013.28

  15. Kurakawa, K., et al.: Researcher Name Resolver: identifier management system for Japanese researchers. Int. J. Digit. Libr. 14(1–2), 39–58 (2014). https://doi.org/10.1007/s00799-014-0109-z

    Article  Google Scholar 

  16. Han, H., Yao, C., Fu, Y., Yu, Y., Zhang, Y., Xu, S.: Semantic fingerprints-based author name disambiguation in Chinese documents. Scientometrics 111(3), 1879–1896 (2017). https://doi.org/10.1007/s11192-017-2338-6

    Article  Google Scholar 

  17. Bravo, M., Reyes-Ortiz, J.A., Cruz, I.: Researcher profile ontology for academic environment. Book Sect. Adv. Intell. Syst. Comput. 943, 799–817 (2019). https://doi.org/10.1007/978-3-030-17795-960

    Article  Google Scholar 

  18. Färber, M., Ao, L.: The microsoft academic knowledge graph enhanced: author name disambiguation, publication classification, and embeddings. Quantitative Sci. Stud. 3(1), 51–98 (2022). https://doi.org/10.1162/qss_a_00183

  19. Santini, C., Gesese, G.A., Peroni, S., Gangemi, A., Sack, H., Alam, M.: A knowledge graph embeddings based approach for author name disambiguation using literals. Scientometrics 127(8), 4887–4912 (2022). https://doi.org/10.1007/s11192-022-04426-2

    Article  Google Scholar 

  20. Gnoyke, P., Matta, K.: Author name disambiguation by clustering based on deep learned pairwise similarities, pp. 0—12, May (2020)

    Google Scholar 

  21. Firdaus, F., et al.: Author identification in bibliographic data using deep neural networks. TELKOMNIKA (Telecommun. Comput. Electron. Control) 19(3), pp. 911–919 (2021). https://doi.org/10.12928/telkomnika.v19i3.18877

  22. Ahmedi, L., Abazi-Bexheti, L., Kadriu, A.: A uniform semantic web framework for co-authorship networks. In: IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing Proceedings, no. 2, pp. 958–965 (2011). https://doi.org/10.1109/DASC.2011.159

  23. Gómez-Pérez, A., Suárez-Figueroa, M.C.: NeOn methodology for building ontology networks: a scenario-based methodology (2009)

    Google Scholar 

  24. Suárez-Figueroa, M.C., Gómez-Pérez, A., Mariano, F.-L.: The NeOn methodology framework: a scenario-based methodology for ontology development. Appl. Ontol. 10(2), 107–145 (2015). https://doi.org/10.3233/AO-150145

  25. Leiva-Mederos, A., García-Duarte, D., Gálvez-Lio, D., Hidalgo-Delgado, Y., Senso-Ruíz, J.S: An ontological model for the failure detection in power electric systems. In: Iberoamerican Knowledge Graphs and Semantic Web Conference Proceedings, pp. 130–146 (2020). https://doi.org/10.1007/978-3-030-65384-2

  26. Díaz-de-la-Paz, L., Riestra-Collado, F. N., García-Mendoza, J. L., GonzálezGonzalez, L. M., Leiva-Mederos, A. A., Taboada-Crispi, A.: Weights estimation in the completeness measurement of bibliographic metadata. Comput. Sist. 25(1), 117–128 (2021). https://doi.org/10.13053/cys-25-1-3355

  27. Le, Q. V., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning Proceedings, arXiv Prepr. arXiv:1405.4053, vol. 32 (2), pp. 1188–1196 (2014)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lisandra Díaz-de-la-Paz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Díaz-de-la-Paz, L., Concepción-Pérez, L., Portal-Díaz, J.A., Taboada-Crispi, A., Leiva-Mederos, A.A. (2022). Framework for Author Name Disambiguation in Scientific Papers Using an Ontological Approach and Deep Learning. In: Villazón-Terrazas, B., Ortiz-Rodriguez, F., Tiwari, S., Sicilia, MA., Martín-Moncunill, D. (eds) Knowledge Graphs and Semantic Web . KGSWC 2022. Communications in Computer and Information Science, vol 1686. Springer, Cham. https://doi.org/10.1007/978-3-031-21422-6_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-21422-6_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-21421-9

  • Online ISBN: 978-3-031-21422-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics