Abstract
Under the data-driven research paradigm, research software has come to play crucial roles in nearly every stage of scientific inquiry. Scholars are advocating for the formal citation of software in academic publications, treating it on par with traditional research outputs. However, software is hardly consistently cited: one software entity can be cited as different objects, and the citations can change over time. These issues, however, are largely overlooked in existing empirical research on software citation. To fill the above gaps, the present study compares and analyzes a longitudinal dataset of citation formats of all R packages collected in 2021 and 2022, in order to understand the citation formats of R-language packages, important members in the open-source software family, and how the citations evolve over time. In particular, we investigate the different document types underlying the citations and what metadata elements in the citation formats changed over time. Furthermore, we offer an in-depth analysis of the disciplinarity of journal articles cited as software (software papers). By undertaking this research, we aim to contribute to a better understanding of the complexities associated with software citation, shedding light on future software citation policies and infrastructure.



Similar content being viewed by others
Notes
References
Barker, M., Chue Hong, N. P., Katz, D. S., Lamprecht, A.-L., Martinez-Ortiz, C., Psomopoulos, F., Harrow, J., Castro, L. J., Gruenpeter, M., Martinez, P. A., & Honeyman, T. (2022). Introducing the FAIR Principles for research software. Scientific Data. https://doi.org/10.1038/s41597-022-01710-x
Boettiger, C., Chamberlain, S., Hart, E., & Ram, K. (2015). Building Software, Building Community: Lessons from the rOpenSci Project. Journal of Open Research Software. https://doi.org/10.5334/jors.bu
Borgman, C. L., Wallis, J. C., & Mayernik, M. S. (2012). Who’s Got the Data? Interdependencies in Science and Technology Collaborations. Computer Supported Cooperative Work (CSCW), 21(6), 485–523. https://doi.org/10.1007/s10606-012-9169-z
Bouquin, D. R., Chivvis, D. A., Henneken, E., Lockhart, K., Muench, A., & Koch, J. (2020). Credit lost: Two decades of software citation in astronomy. The Astrophysical Journal Supplement Series, 249(1), 8. https://doi.org/10.3847/1538-4365/ab7be6
Branstetter, L. G., Glennon, B., & Jensen, J. B. (2019). The IT revolution and the globalization of R&D. Innovation Policy and the Economy, 19, 1–37. https://doi.org/10.1086/699931
Burton, R. E., & Kebler, R. W. (1960). The “half-life” of some scientific and technical literatures. American Documentation, 11(1), 18–22. https://doi.org/10.1002/asi.5090110105
Candela, L., Castelli, D., Manghi, P., & Tani, A. (2015). Data journals: A survey. Journal of the Association for Information Science and Technology, 66(9), 1747–1762. https://doi.org/10.1002/asi.23358
Charalampopoulos, I. (2020). The R language as a tool for biometeorological research. Atmosphere, 11(7), 7. https://doi.org/10.3390/atmos11070682
Chassanoff, A., & Altman, M. (2020). Curation as “interoperability with the future”: Preserving scholarly research software in academic libraries. Journal of the Association for Information Science & Technology, 71(3), 325–337. https://doi.org/10.5703/1288284315651
Chue Hong, N., Hole, B., & Moore, S. (2013). Software papers: Improving the reusability and sustainability of scientific software. Figshare. Journal Contribution. https://doi.org/10.6084/M9.FIGSHARE.795303.V1
Druskat, S. (2020). Software and dependencies in research citation graphs. Computing in Science & Engineering, 22(2), 8–21. https://doi.org/10.1109/MCSE.2019.2952840
Du, C., Cohoon, J., Lopez, P., & Howison, J. (2021). Softcite dataset: A dataset of software mentions in biomedical and economic research publications. Journal of the Association for Information Science and Technology, 72(7), 870–884. https://doi.org/10.1002/asi.24454
Du, C., Cohoon, J., Lopez, P., & Howison, J. (2022). Understanding progress in software citation: A study of software citation in the CORD-19 corpus. PeerJ Computer Science, 8, e1022. https://doi.org/10.7717/peerj-cs.1022
Duck, G., Nenadic, G., Brass, A., Robertson, D. L., & Stevens, R. (2013). BioNerDS: Exploring bioinformatics’ database and software use through literature mining. BMC Bioinformatics, 14(1), 194.
Edwards, P. N., Jackson, S. J., Chalmers, M. K., Bowker, G. C., Borgman, C. L., Ribes, D., Burton, M., & Calvert, S. (2013). Knowledge infrastructures: intellectual frameworks and research challenges. https://escholarship.org/uc/item/2mt6j2mh
Fox, J., & Leanage, A. (2016). R and the journal of statistical software. Journal of Statistical Software, 73, 1–13. https://doi.org/10.18637/jss.v073.i02
Garson, G. D. (2022). Factor analysis and dimension reduction in R: A social scientist’s toolkit (1st edition). Taylor & Francis.
Gentleman, R. C., Carey, V. J., Bates, D. M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., et al. (2004). Bioconductor: Open software development for computational biology and bioinformatics. Genome Biology, 5(10), 1–16.
Hocquet, A., & Wieber, F. (2018). Mailing list archives as useful primary sources for historians: Looking for flame wars. Internet Histories, 2(1–2), 38–54.
Hong, C., Allen, A., Gonzalez-Beltran, A., de Waard, A., Smith, A. M., Robinson, C., Jones, C., Bouquin, D., Katz, D. S., Kennedy, D., Ryder, G., Hausman, J., Hwang, L., Jones, M. B., Harrison, M., Crosas, M., Wu, M., Löwe, P., Haines, R., … Pollard, T. (2019a). Software Citation Checklist for Authors. Zenodo. https://doi.org/10.5281/zenodo.3479199
Hong, C., Allen, A., & Gonzalez-Beltran, de Waard, A., Smith, A. M., Robinson, C., Jones, C., Bouquin, D., Katz, D. S., Kennedy, D., Ryder, G., Hausman, J., Hwang, L., Jones, M. B., Harrison, M., Crosas, M., Wu, M., Löwe, P., Haines, R., & Pollard, T. (2019b). Software Citation Checklist for Developers. Zenodo. https://doi.org/10.5281/zenodo.3482769
Hornik, K. (2012). The comprehensive R archive network. Wires Computational Statistics, 4(4), 394–398. https://doi.org/10.1002/wics.1212
Howison, J., & Bullard, J. (2015). Software in the scientific literature: Problems with seeing, finding, and using software mentioned in the biology literature. Journal of the Association for Information Science and Technology, 67(9), 2137–2155. https://doi.org/10.1002/asi.23538
Howison, J., Deelman, E., McLennan, M. J., Ferreira da Silva, R., & Herbsleb, J. D. (2015). Understanding the scientific software ecosystem and its impact: Current and future measures. Research Evaluation, 24(4), 454–470. https://doi.org/10.1093/reseval/rvv014
Istrate, A.-M., Li, D., Taraborelli, D., Torkar, M., Veytsman, B., & Williams, I. (2022). A large dataset of software mentions in the biomedical literature (arXiv:2209.00693). https://doi.org/10.48550/arXiv.2209.00693
Jay, C., Haines, R., & Katz, D. S. (2021). Software must be recognised as an important output of scholarly research. International Journal of Digital Curation, 16(1), 6. https://doi.org/10.2218/ijdc.v16i1.745
Jiang, C., Zhu, Z., Shen, S., & Wang, D. (2019). Research on software entity extraction and analysis based on deep learning. In G. Catalano, C. Daraio, M. Gregori, H. F. Moed, & G. Ruocco (Eds.), Proceedings of the 17th International Conference on Scientometrics and Informetrics, ISSI 2019, Rome, Italy, September 2–5, 2019 (pp. 2742–2743). ISSI Society.
Katz, D. S., Hong, N. P. C., Clark, T., Muench, A., Stall, S., Bouquin, D., Cannon, M., Edmunds, S., Faez, T., Feeney, P., Fenner, M., Friedman, M., Grenier, G., Harrison, M., Heber, J., Leary, A., MacCallum, C., Murray, H., Pastrana, E., … Yeston, J. (2021). Recognizing the value of software: A software citation guide. F1000Research, 9, 1257. https://f1000research.com/articles/9-1257
Katz, D. S., Niemeyer, K. E., Smith, A. M., Anderson, W. L., Boettiger, C., Hinsen, K., Hooft, R., Hucka, M., Lee, A., Löffler, F., Pollard, T., & Rios, F. (2016). Software vs. Data in the Context of Citation. https://doi.org/10.7287/peerj.preprints.2630v1
Kelley, A., & Garijo, D. (2021). A framework for creating knowledge graphs of scientific software metadata. Quantitative Science Studies, 2(4), 1423–1446. https://doi.org/10.1162/qss_a_00167
Kelty, C. M. (2001, December 3). Free software/free science. First Monday. https://firstmonday.org/ojs/index.php/fm/article/download/902/811?inline=1
Kratz, J., & Strasser, C. (2014). Data publication consensus and controversies. F1000Research, 3, 94. https://doi.org/10.12688/f1000research.3979.3
Lamprecht, A.-L., Garcia, L., Kuzak, M., Martinez, C., Arcila, R., Martin Del Pico, E., Dominguez Del Angel, V., van de Sandt, S., Ison, J., Martinez, P. A., McQuilton, P., Valencia, A., Harrow, J., Psomopoulos, F., Gelpi, J. L., Chue Hong, N., Goble, C., & Capella-Gutierrez, S. (2020). Towards FAIR principles for research software. Data Science, 3(1), 37–59. https://doi.org/10.3233/DS-190026
LaZerte, S. (2021). How to Cite R and R Packages. https://ropensci.org/blog/2021/11/16/how-to-cite-r-and-r-packages/
Leydesdorff, L. (2009). How are new citation-based journal indicators adding to the bibliometric toolbox? Journal of the American Society for Information Science and Technology, 60(7), 1327–1336. https://doi.org/10.1002/asi.21024
Li, K., Chen, P.-Y., & Fang, Z. (2019a). Disciplinarity of software papers: A preliminary analysis. Proceedings of the Association for Information Science and Technology, 56(1), 706–708. https://doi.org/10.1002/pra2.143
Li, K., Chen, P.-Y., & Yan, E. (2019b). Challenges of measuring software impact through citations: An examination of the lme4 R package. Journal of Informetrics, 13(1), 449–461.
Li, K., & Yan, E. (2018). Co-mention network of R packages: Scientific impact and clustering structure. Journal of Informetrics, 12(1), 87–100.
Li, K., Yan, E., & Feng, Y. (2017). How is R cited in research outputs? Structure, impacts, and citation standard. Journal of Informetrics, 11(4), 989–1002.
Loukides, M. (2010, June 2). What is data science? O’Reilly Media. https://www.oreilly.com/radar/what-is-data-science/
Manghi, P., Mannocci, A., Osborne, F., Sacharidis, D., Salatino, A., & Vergoulis, T. (2021). New trends in scientific knowledge graphs and research impact assessment. Quantitative Science Studies, 2(4), 1296–1300. https://doi.org/10.1162/qss_e_00160
Manovich, L. (2013). Software Takes Command. Bloomsbury Academic. https://www.academia.edu/542750/Software_Takes_Command
Pan, X., Yan, E., Cui, M., & Hua, W. (2018). Examining the usage, citation, and diffusion patterns of bibliometric mapping software: A comparative study of three tools. Journal of Informetrics, 12(2), 481–493.
Pan, X., Yan, E., Cui, M., & Hua, W. (2019). How important is software to library and information science research? A content analysis of full-text publications. Journal of Informetrics, 13(1), 397–406. https://doi.org/10.1016/j.joi.2019.02.002
Pan, X., Yan, E., & Hua, W. (2016). Disciplinary differences of software use and impact in scientific literature. Scientometrics, 109(3), 1593–1610.
Pan, X., Yan, E., Wang, Q., & Hua, W. (2015). Assessing the impact of software on science: A bootstrapped learning of software entities in full-text papers. Journal of Informetrics, 9(4), 860–871. https://doi.org/10.1016/j.joi.2015.07.012
Park, H., & Wolfram, D. (2019). Research software citation in the Data Citation Index: Current practices and implications for research software sharing and reuse. Journal of Informetrics, 13(2), 574–582. https://doi.org/10.1016/j.joi.2019.03.005
Parsons, M., & Fox, P. (2013). Is data publication the right metaphor? Data Science Journal. https://doi.org/10.2481/dsj.WDS-042
Schindler, D., Bensmann, F., Dietze, S., & Krüger, F. (2022). The role of software in science: A knowledge graph-based analysis of software mentions in PubMed Central. PeerJ Computer Science, 8, e835. https://doi.org/10.7717/peerj-cs.835
Schindler, D., Zapilko, B., Krüger, F. (2020). Investigating Software Usage in the Social Sciences: A Knowledge Graph Approach. In: Harth, A., et al. The Semantic Web. ESWC 2020. Lecture Notes in Computer Science, 12123. Springer, Cham. https://doi.org/10.1007/978-3-030-49461-2_16
Shu, F., Julien, C.-A., Zhang, L., Qiu, J., Zhang, J., & Larivière, V. (2019). Comparing journal and paper level classifications of science. Journal of Informetrics, 13(1), 202–225. https://doi.org/10.1016/j.joi.2018.12.005
Smith, A. M., Katz, D. S., & Niemeyer, K. E. (2016). Software citation principles. PeerJ Computer Science, 2, e86. https://doi.org/10.7717/peerj-cs.86
Stack Overflow. (2023). Stack Overflow Developer Survey 2023. Stack Overflow. https://survey.stackoverflow.co/2023/
TIOBE-index. (2023). Index | TIOBE - The Software Quality Company. https://www.tiobe.com/tiobe-index/
United Nations Conference on Trade and Development. (2012). Software for development. In United Nations Conference on Trade and Development, Information Economy Report 2012 (pp. 1–16). UN. https://doi.org/10.18356/56e6e4ed-en
Van Raan, A. F. J. (2004). Sleeping beauties in science. Scientometrics, 59(3), 467–472. https://doi.org/10.1023/B:SCIE.0000018543.82441.f1
Wang, Y., & Zhang, C. (2020). Using the full-text content of academic articles to identify and evaluate algorithm entities in the domain of natural language processing. Journal of Informetrics, 14(4), 101091. https://doi.org/10.1016/j.joi.2020.101091
Wei, Q., Zhang, Y., Amith, M., Lin, R., & Xu, H. (2020). Recognizing software names in biomedical literature using machine learning. Health Informatics Journal, 26(1), 21–33.
Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. (2023). R for Data Science (2nd ed.). O’REILLY. https://r4ds.had.co.nz/
Wilkinson, M. D., Dumontier, M., Aalbersberg, Ij., & J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3(1), 1. https://doi.org/10.1038/sdata.2016.18
Wolfram, S. (1984). Computer Software in Science and Mathematics. Scientific American, 251(3), 188–203.
Yang, B., Huang, S., Wang, X., & Rousseau, R. (2018). How important is scientific software in bioinformatics research? A comparative study between international and Chinese research communities. Journal of the Association for Information Science and Technology, 69(9), 1122–1133. https://doi.org/10.1002/asi.24031
Zhao, R., & Wei, M. (2017). Impact evaluation of open source software: An Altmetrics perspective. Scientometrics, 110(2), 1017–1033. https://doi.org/10.1007/s11192-016-2204-y
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, Y., Li, K. How do official software citation formats evolve over time? A longitudinal analysis of R programming language packages. Scientometrics 129, 3997–4019 (2024). https://doi.org/10.1007/s11192-024-05064-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-024-05064-6