Elsevier

Journal of Informetrics

Volume 13, Issue 1, February 2019, Pages 397-406
Journal of Informetrics

Regular article
How important is software to library and information science research? A content analysis of full-text publications

https://doi.org/10.1016/j.joi.2019.02.002Get rights and content

Highlights

  • This paper investigates the contributions of software to LIS research.

  • We explore the extent to which researchers follow software citation instructions.

  • Nearly 30% of the LIS articles in our sample explicitly mention and use software.

  • A substantial proportion of scholars do not follow software citation instructions.

Abstract

We investigate the contributions of scientific software to library and information science (LIS) research using a sample of 572 English language articles published in 13 journals in 2008, 2011, 2014, and 2017. In particular, we examine the use and citation of software freely available for academic use in the LIS literature; we also explore the extent to which researchers follow software citation instructions provided by software developers. Twenty-seven percent of the LIS journal articles in our sample explicitly mention and use software. Yet although LIS researchers are becoming increasingly reliant on software that is freely available for academic use, many still fail to include formal citations of such software in their publications. We also find that a substantial proportion of researchers, when documenting software use, do not cite the software in the manner recommended by its developers.

Introduction

In the current scientific reward system, scientists’ impact is largely assessed via their publication history. This tendency has driven scientists to pursue publications as an end product of their research (Fanelli, 2010; Jacob & Lefgren, 2011; Wang, Liu, Ding, & Wang, 2012). Non-publication outputs, such as data and software, have long been underestimated in comparison with publications (Belter, 2014, Hafer and Kirkpatrick, 2009, Poisot, 2015). However, recent years have witnessed the production of more and more non-publication outputs (e.g., scientific data and software), which have played an increasingly important role in advancing scientific theory and practice (Belter, 2014; Chao, 2011; Howison, Deelman, McLennan, Da Silva, & Herbsleb, 2015). As the importance of non-publication outputs is increasingly recognized, some funding agencies, such as the U.S. National Science Foundation and the Higher Education Funding Council for England, have begun to include software, research datasets, and other non-traditional outputs in their consideration of investigators’ intellectual contributions (National Science Foundation, 2013, Research Excellence Framework, 2013).

Among the non-publication research outputs, scientific data has attracted the most academic attention because of the widespread recognition that “science is becoming data-intensive and collaborative” (National Science Foundation, 2010, Tenopir et al., 2011). Many researchers have invested considerable effort into the study of scientific data from numerous perspectives, such as data sharing and reuse, data curation, and data citation (Altman, Borgman, Crosas, & Matone, 2015; Mooney & Newton, 2012; Nelson, 2009; Piwowar & Vision, 2013; Wallis, Rolando, & Borgman, 2013; Witt, Carlson, Brandt, & Cragin, 2009). Compared with scientific data, scientific software has garnered less attention from the academic community and has not been widely valued as an academic contribution. Software has long been considered as supporting service (Howison & Herbsleb, 2014) due to the wide use of commercial software. However, the open source movement has produced vast quantities of free software, much of which has found extensive use in the scientific community in recent years (Huang et al., 2013; Pan, Yan, Wang, & Hua, 2015). Moreover, a substantial proportion of scientists spend a considerable amount of their own research time developing software tools to facilitate their research (Poisot, 2015, Prabhu et al., 2011); in many cases, these tools are then made publicly available (Hannay et al., 2009; Nguyen-Hoan, Flint, & Sankaranarayana, 2010). There is evidence to suggest that these developers are concerned with the use and impact of their software (Trainer, Chaihirunkarn, Kalyanasundaram, & Herbsleb, 2015), and that scientific end users are also interested to know what software others have used (Howison et al., 2015, Huang et al., 2013). Thus, some scholars have begun to investigate the use and impact of software in scientific publications (e.g., Li, Yan, & Feng, 2017; Pan, Yan, & Hua, 2016).

Some such studies have focused on biology research, where researchers have established the important role that software plays in biological research (Howison & Bullard, 2016; Yang, Rousseau, Wang, & Huang, 2018). Other studies have explored the use and impact of particular software tools, such as R, CiteSpace, HistCite and VOSviewer, in scientific publications; these software tools have likewise been found to have a substantial impact on scientific research (Li et al., 2017; Pan, Yan, Cui, & Hua, 2018). To date, however, few studies have quantified the impact of scientific software on library and information science (LIS) research. One previous study investigated the proportion of LIS articles containing computing terms in the title, abstract or keywords based on a terminology list, finding that about two thirds of articles post-2000 made mention of computing technologies (Thelwall & Maflahi, 2015). However, this study did not analyze software as a single object, distinct from other technological terms and resources. The present study fills this gap by examining the extent to which scientific software is explicitly mentioned and used in full-text LIS articles. For this study, scientific software is defined broadly as software used for scientific purposes, including software designed purely to facilitate research work and software meant to deal with other kinds of work (e.g. office software). Considering that it is not feasible to annotate such software consistently, this study focused on software that is explicitly mentioned in the articles—that is, software used but not mentioned in the articles is ignored. For instance, if a study stated that “a program was written to process the text of each video's title and description,” that program was not included in our analysis.

Citation count, often used to assess the impact of publications and data (Belter, 2014, Cartes-Velásquez and Manterola Delgado, 2014), seems to be suitable for measuring the impact of software as well. However, our previous study has found that more than 40% of software tools used in PLOS ONE articles received no formal citations (Pan et al., 2015). Howison and Bullard (2016) have likewise found that 56% of software mentions in the biology literature did not include a formal citation. Software “uncitedness” has also been shown to be prevalent in bioinformatics papers (Yang et al., 2018). Taken together, these earlier studies demonstrate that a considerable proportion of software tools are not formally cited in scientific publications. As yet, however, little is known about the extent to which software freely available for academic use is cited in the scientific literature. In this article, we define software freely available as software that can be obtained for academic use without payment, including open source software (e.g., SciMAT and Weka) and non-open source software that is free for academic use (e.g., Sci2 Tool and CiteSpace). Earlier evidence has suggested that extrinsic benefits, such as citations and career advancement, motivate scientists to develop and share software (Howison & Herbsleb, 2011; Roberts, Hann, & Slaughter, 2006). A study focusing on software that is freely available for academic use, will thus illuminate the extent to which developers receive credit for software development and sharing.

Considering that many researchers cite publications but fail to cite software, some scholars have proposed alternative metrics, in addition to citation count, as a means of evaluating the impact of software. They suggest that the number of mentions, downloads, users, registered users, user messages, and user reviews can be used as indicators for measuring this impact (Howison et al., 2015, Pan et al., 2016, Thelwall and Kousha, 2016, Zhao and Wei, 2017). These indicators are no doubt useful, but accurate data concerning some of them is difficult to collect. For instance, if a software tool, which can be downloaded without payment or registration, is distributed via multiple websites, the user count is hard to obtain. Moreover, some of these indicators may provide a biased picture of the academic impact of scientific software. For example, some users may download a software tool multiple times without ever using it in their research. Faced with such circumstances, other scholars hold that a greater effort must be made to improve the practice of software citation—e.g., by creating software citation principles and developing tools to support software citation (Smith, Katz, & Niemeyer, 2016; Soito & Hwang, 2016). Certainly, much work remains to be done to improve the practice of software citation and the efficacy of research evaluation.

In this study, we extend existing studies on the impact of scientific software to the field of LIS, focusing specifically on the use and citation of software that is freely available for academic use. We aim to answer the following questions:

  • 1.

    How important is software to LIS research?

  • 2.

    How is software—in particular, software freely available for academic use—used and cited in LIS research?

  • 3.

    To what extent do LIS researchers cite software as recommended by its developers?

The answers to the above questions will provide a fuller understanding of the importance of software to scientific research and reveal a more complete and detailed landscape of software citation practices. As the first empirical study focusing on the use of software in the LIS literature, this study will also give a better understanding of the influence of scientific software on LIS specifically. Additionally, this study explores the discrepancy between LIS researchers’ actual citation practices and those proposed as best practices by software developers. Reasons for this lack of consistency are identified, with a view to improving the efficacy of software use and scholarly communications.

Section snippets

Data source

Thirteen LIS journals (Appendix A) were selected from a list of 16 journals used by a previous study on the cognitive structure of LIS (Milojević, Sugimoto, Yan, & Ding, 2011). The set of 16 had itself been highly selective, drawn from a list of important LIS journals rated by American Library Association-accredited education program deans and Association of Research Libraries member library directors (Nisonger & Davis, 2005). Three journals were discarded from the previous list of 16: Annual

How important is software for LIS research?

Among the 572 LIS journal articles we surveyed, 153 (27%) explicitly mentioned and used software. Compared to the reported proportion of articles mentioning software (65%) in a previous study on 90 biology papers (Howison & Bullard, 2016), the proportion of articles using software in the field of LIS is small. It should be noted that articles mentioning but not actually using software were not taken into account in our study; this might be one reason for the smaller proportion. By year, the use

Discussion and conclusion

This study examines the importance of software to LIS research as well as the use and citation of software that is freely available for academic use in the scientific literature. Moreover, this article explores the degree to which software citation instructions are promulgated and followed. We first selected a sample of 572 articles from the 3950 research articles published in 13 LIS journals in 2008, 2011, 2014, and 2017, then performed content analysis to identify software packages as well as

Author contributions

Xuelian Pan: Conceived and designed the analysis; Collected the data; Contributed data or analysis tools; Performed the analysis; Wrote the paper.

Erjia Yan: Performed the analysis; Wrote the paper.

Ming Cui: Collected the data; Contributed data or analysis tools.

Weina Hua: Conceived and designed the analysis; Collected the data; Performed the analysis.

Acknowledgments

This work was funded by the National Natural Science Foundation of China (Grant No. 71704077). Also, we are grateful to the reviewers for very helpful comments.

References (47)

  • T.C. Chao

    Disciplinary reach: Investigating the impact of dataset reuse in the earth sciences

    Proceedings of the ASIST

    (2011)
  • M. Cui et al.

    Software usage and citation in the field of library and information science in China

    Journal of Library Science in China

    (2018)
  • D. Fanelli

    Do pressures to publish increase scientists’ bias? An empirical support from US states data

    PLOS ONE

    (2010)
  • D. Freelon

    ReCal: Intercoder reliability calculation as a web service

    International Journal of Internet Science

    (2010)
  • Freelon, D. (2010b). ReCal2: Reliability for 2 Coders [Computer software]. Retrieved from...
  • L. Hafer et al.

    Assessing open source software as a scholarly contribution

    Communications of the ACM

    (2009)
  • J.E. Hannay et al.

    How do scientists develop and use scientific software?

    Proceedings of the 2009 ICSE workshop on software engineering for computational science and engineering, SECSE 2009

    (2009)
  • J. Howison et al.

    Software in the scientific literature: Problems with seeing, finding, and using software mentioned in the biology literature

    Journal of the Association for Information Science and Technology

    (2016)
  • J. Howison et al.

    Understanding the scientific software ecosystem and its impact: Current and future measures

    Research Evaluation

    (2015)
  • J. Howison et al.

    Scientific software production: incentives and collaboration

    Proceedings of the ACM 2011 conference on computer supported cooperative work

    (2011)
  • Howison, J., & Herbsleb, J. (2014). The sustainability of scientific software: Ecosystem context and science policy....
  • X. Huang et al.

    Meanings and boundaries of scientific software sharing

    Proceedings of the 2013 conference on computer supported cooperative work (CSCW)

    (2013)
  • Y.H. Huang et al.

    Citing a data repository: A case study of the protein data bank

    PLOS ONE

    (2015)
  • Cited by (0)

    View full text