Abstract
Though the bibliographic databases, such as Web of Science (WoS), largely promote the development of scientometrics and informetrics, these databases are not free of errors. The main purpose of this work is to figure out which types of DOI errors of cited references exist, how often each type of errors occur, and whether it is possible to automatically correct these errors. After careful analysis, several classic DOI errors of cited references, such as prefix-, suffix- and other-type errors, are identified, Then, a cleaning method is put forward on the basis of regular expressions. Experimental results on the bibliographic data in the gene editing field from the WoS database indicate that our cleaning approach can improve largely the quality of DOI names of cited references.
Similar content being viewed by others
References
Boundry, C., & Chartron, G. (2017). Availability of digital object identifiers in publications archived by PubMed. Scientometrics, 110(3), 1453–1469. https://doi.org/10.1007/s11192-016-2225-6.
Buchanan, R. A. (2006). Accuracy of cited references: The role of citation databases. College and Research Libraries, 67(4), 292–303. https://doi.org/10.5860/crl.67.4.292.
Chandrakar, R. (2006). Digital object identifier system: An overview. The Electronic Library, 24(4), 445–452. https://doi.org/10.1108/02640470610689151.
Franceschini, F., Maisano, D., & Mastrogiacomo, L. (2013). A novel approach for estimating the omitted-citation rate of bibliometric databases with an application to the field of bibliometrics. Journal of the Association for Information Science and Technology, 64(10), 2149–2156. https://doi.org/10.1002/asi.22898.
Franceschini, F., Maisano, D., & Mastrogiacomo, L. (2014). Scientific journal publishers and omitted citations in bibliometric databases: Any relationship? Journal of Informetrics, 8(3), 751–765. https://doi.org/10.1016/j.joi.2014.07.003.
Franceschini, F., Maisano, D., & Mastrogiacomo, L. (2015). Errors in indexing bybibliometric databases. Scientometrics, 102(3), 2181–2186. https://doi.org/10.1007/s11192-014-1503-4.
Franceschini, F., Maisano, D., & Mastrogiacomo, L. (2016). The museum of errors/horrors in Scopus. Journal of Informetrics, 10(1), 174–182. https://doi.org/10.1016/j.joi.2015.11.006.
Goldstein, M., & Uchida, S. (2016). A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PLoS ONE, 11(4), e0152173. https://doi.org/10.1371/journal.pone.0152173.
Gorraiz, J., Melero-Fuentes, D., Gumpenberger, C., & Valderrama-Zurián, J.-C. (2016). Availability of digital object identifiers (DOIs) in Web of Science and scopus. Journal of Informetrics, 10(1), 98–109. https://doi.org/10.1016/j.joi.2015.11.008.
Haustein, S., Costas, R., & Larivière, V. (2015). Characterizing social media metrics of scholarly papers: The effect of document properties and collaboration patterns. PLoS ONE, 10(5), e0127830. https://doi.org/10.1371/journal.pone.0120495.
Huang, M., & Liu, W. (2019). Substantial numbers of easily identifiable illegal DOIs still exist in Scopus. Journal of Informetrics,. https://doi.org/10.1016/j.joi.2019.03.019.
Jacso, P. (2006). Deflated, inflated and phantom citation counts. Online Information Review, 30(3), 297–309. https://doi.org/10.1108/14684520610675816.
Jobmann, A., Hoffmann, C. P., Künne, S., Peters, I., Schmitz, J., & Wollnik-Korn, G. (2014). Altmetrics for large, multidisciplinary research groups: Comparison of current tools. Bibliometrie-Praxis und Forschung, 3(1), 1–19. https://doi.org/10.5283/bpf.205.
Krauskopf, E. (2019). Missing documents in Scopus: The case of the journal enfermeria nefrologica. Scientometrics, 119(1), 543–547. https://doi.org/10.1007/s11192-019-03040-z.
Liu, W., Hu, G., & Tang, L. (2018). Missing author address information in Web of Science-an explorative study. Journal of Informetrics, 12(3), 985–997. https://doi.org/10.1016/j.joi.2018.07.008.
Neumann, J., & Brase, J. (2014). DataCite and names for research data. Journal of Computer-Aided Molecular Design, 28(10), 1035–1041. https://doi.org/10.1007/s10822-014-9776-5.
Paskin, N. (1999). The digital object identifier system: Digital technology meets content management. Interlending & Document Supply, 27(1), 13–16. https://doi.org/10.1108/02641619910255829.
Paskin, N. (2010). Digital object identifier (DOI) system. In A. Kent (Ed.), Encyclopedia of library and information sciences (3rd ed., pp. 1586–1592). Milton Park: Taylor and Francis.
Sidman, D., & Davidson, T. (2001). A practical guide to automating the digital supply chain with the digital object identifier (DOI). Publishing Research Quarterly, 17(2), 9–23. https://doi.org/10.1007/s12109-001-0019-y.
Simmonds, A. W. (1999). The digital object identifier (DOI). Publishing Research Quarterly, 15(2), 10–13. https://doi.org/10.1007/s12109-999-0022-2.
Tang, L., Hu, G., & Liu, W. (2017). Funding acknowledgement analysis: Queries and caveats. Journal of the Association for Information Science and Technology, 68(3), 790–794. https://doi.org/10.1002/asi.23713.
Valderrama-Zurián, J.-C., Aguilar-Moya, R., Melero-Fuentes, D., & Aleixandre- Benavent, R. (2015). A systematic analysis of duplicate records in Scopus. Journal of Informetrics, 9(3), 570–576. https://doi.org/10.1016/j.joi.2015.05.002.
Wang, J. (2007). Digital object identifiers and their use in libraries. Serials Review, 33(3), 161–164. https://doi.org/10.1016/j.serrev.2007.05.006.
Xu, S., Liu, J., Zhai, D., An, X., Wang, Z., & Pang, H. (2018). Overlapping thematic structures extraction with mixed-membership stochastic blockmodel. Scientometrics, 117(1), 61–84. https://doi.org/10.1007/s11192-018-2841-4.
Zhu, J., Hu, G., & Liu, W. (2019). DOI errors and possible solutions for Web of Science. Scientometrics, 118(2), 709–718. https://doi.org/10.1007/s11192-018-2980-7.
Zhu, J., Liu, F., & Liu, W. (2019). The secrets behind Web of Science’s search. Scientometrics, 4, 1745–1753. https://doi.org/10.1007/s11192-019-03091-2.
Acknowledgements
Our gratitude goes to the anonymous reviewers and the editor for their valuable comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported partially by the Social Science Foundation of Beijing [Grant Number 17GLB074] and Natural Science Foundation of Guangdong Province under Grant Number 2018A030313695.
Rights and permissions
About this article
Cite this article
Xu, S., Hao, L., An, X. et al. Types of DOI errors of cited references in Web of Science with a cleaning method. Scientometrics 120, 1427–1437 (2019). https://doi.org/10.1007/s11192-019-03162-4
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-019-03162-4