Regular articleHow important is software to library and information science research? A content analysis of full-text publications
Introduction
In the current scientific reward system, scientists’ impact is largely assessed via their publication history. This tendency has driven scientists to pursue publications as an end product of their research (Fanelli, 2010; Jacob & Lefgren, 2011; Wang, Liu, Ding, & Wang, 2012). Non-publication outputs, such as data and software, have long been underestimated in comparison with publications (Belter, 2014, Hafer and Kirkpatrick, 2009, Poisot, 2015). However, recent years have witnessed the production of more and more non-publication outputs (e.g., scientific data and software), which have played an increasingly important role in advancing scientific theory and practice (Belter, 2014; Chao, 2011; Howison, Deelman, McLennan, Da Silva, & Herbsleb, 2015). As the importance of non-publication outputs is increasingly recognized, some funding agencies, such as the U.S. National Science Foundation and the Higher Education Funding Council for England, have begun to include software, research datasets, and other non-traditional outputs in their consideration of investigators’ intellectual contributions (National Science Foundation, 2013, Research Excellence Framework, 2013).
Among the non-publication research outputs, scientific data has attracted the most academic attention because of the widespread recognition that “science is becoming data-intensive and collaborative” (National Science Foundation, 2010, Tenopir et al., 2011). Many researchers have invested considerable effort into the study of scientific data from numerous perspectives, such as data sharing and reuse, data curation, and data citation (Altman, Borgman, Crosas, & Matone, 2015; Mooney & Newton, 2012; Nelson, 2009; Piwowar & Vision, 2013; Wallis, Rolando, & Borgman, 2013; Witt, Carlson, Brandt, & Cragin, 2009). Compared with scientific data, scientific software has garnered less attention from the academic community and has not been widely valued as an academic contribution. Software has long been considered as supporting service (Howison & Herbsleb, 2014) due to the wide use of commercial software. However, the open source movement has produced vast quantities of free software, much of which has found extensive use in the scientific community in recent years (Huang et al., 2013; Pan, Yan, Wang, & Hua, 2015). Moreover, a substantial proportion of scientists spend a considerable amount of their own research time developing software tools to facilitate their research (Poisot, 2015, Prabhu et al., 2011); in many cases, these tools are then made publicly available (Hannay et al., 2009; Nguyen-Hoan, Flint, & Sankaranarayana, 2010). There is evidence to suggest that these developers are concerned with the use and impact of their software (Trainer, Chaihirunkarn, Kalyanasundaram, & Herbsleb, 2015), and that scientific end users are also interested to know what software others have used (Howison et al., 2015, Huang et al., 2013). Thus, some scholars have begun to investigate the use and impact of software in scientific publications (e.g., Li, Yan, & Feng, 2017; Pan, Yan, & Hua, 2016).
Some such studies have focused on biology research, where researchers have established the important role that software plays in biological research (Howison & Bullard, 2016; Yang, Rousseau, Wang, & Huang, 2018). Other studies have explored the use and impact of particular software tools, such as R, CiteSpace, HistCite and VOSviewer, in scientific publications; these software tools have likewise been found to have a substantial impact on scientific research (Li et al., 2017; Pan, Yan, Cui, & Hua, 2018). To date, however, few studies have quantified the impact of scientific software on library and information science (LIS) research. One previous study investigated the proportion of LIS articles containing computing terms in the title, abstract or keywords based on a terminology list, finding that about two thirds of articles post-2000 made mention of computing technologies (Thelwall & Maflahi, 2015). However, this study did not analyze software as a single object, distinct from other technological terms and resources. The present study fills this gap by examining the extent to which scientific software is explicitly mentioned and used in full-text LIS articles. For this study, scientific software is defined broadly as software used for scientific purposes, including software designed purely to facilitate research work and software meant to deal with other kinds of work (e.g. office software). Considering that it is not feasible to annotate such software consistently, this study focused on software that is explicitly mentioned in the articles—that is, software used but not mentioned in the articles is ignored. For instance, if a study stated that “a program was written to process the text of each video's title and description,” that program was not included in our analysis.
Citation count, often used to assess the impact of publications and data (Belter, 2014, Cartes-Velásquez and Manterola Delgado, 2014), seems to be suitable for measuring the impact of software as well. However, our previous study has found that more than 40% of software tools used in PLOS ONE articles received no formal citations (Pan et al., 2015). Howison and Bullard (2016) have likewise found that 56% of software mentions in the biology literature did not include a formal citation. Software “uncitedness” has also been shown to be prevalent in bioinformatics papers (Yang et al., 2018). Taken together, these earlier studies demonstrate that a considerable proportion of software tools are not formally cited in scientific publications. As yet, however, little is known about the extent to which software freely available for academic use is cited in the scientific literature. In this article, we define software freely available as software that can be obtained for academic use without payment, including open source software (e.g., SciMAT and Weka) and non-open source software that is free for academic use (e.g., Sci2 Tool and CiteSpace). Earlier evidence has suggested that extrinsic benefits, such as citations and career advancement, motivate scientists to develop and share software (Howison & Herbsleb, 2011; Roberts, Hann, & Slaughter, 2006). A study focusing on software that is freely available for academic use, will thus illuminate the extent to which developers receive credit for software development and sharing.
Considering that many researchers cite publications but fail to cite software, some scholars have proposed alternative metrics, in addition to citation count, as a means of evaluating the impact of software. They suggest that the number of mentions, downloads, users, registered users, user messages, and user reviews can be used as indicators for measuring this impact (Howison et al., 2015, Pan et al., 2016, Thelwall and Kousha, 2016, Zhao and Wei, 2017). These indicators are no doubt useful, but accurate data concerning some of them is difficult to collect. For instance, if a software tool, which can be downloaded without payment or registration, is distributed via multiple websites, the user count is hard to obtain. Moreover, some of these indicators may provide a biased picture of the academic impact of scientific software. For example, some users may download a software tool multiple times without ever using it in their research. Faced with such circumstances, other scholars hold that a greater effort must be made to improve the practice of software citation—e.g., by creating software citation principles and developing tools to support software citation (Smith, Katz, & Niemeyer, 2016; Soito & Hwang, 2016). Certainly, much work remains to be done to improve the practice of software citation and the efficacy of research evaluation.
In this study, we extend existing studies on the impact of scientific software to the field of LIS, focusing specifically on the use and citation of software that is freely available for academic use. We aim to answer the following questions:
- 1.
How important is software to LIS research?
- 2.
How is software—in particular, software freely available for academic use—used and cited in LIS research?
- 3.
To what extent do LIS researchers cite software as recommended by its developers?
The answers to the above questions will provide a fuller understanding of the importance of software to scientific research and reveal a more complete and detailed landscape of software citation practices. As the first empirical study focusing on the use of software in the LIS literature, this study will also give a better understanding of the influence of scientific software on LIS specifically. Additionally, this study explores the discrepancy between LIS researchers’ actual citation practices and those proposed as best practices by software developers. Reasons for this lack of consistency are identified, with a view to improving the efficacy of software use and scholarly communications.
Section snippets
Data source
Thirteen LIS journals (Appendix A) were selected from a list of 16 journals used by a previous study on the cognitive structure of LIS (Milojević, Sugimoto, Yan, & Ding, 2011). The set of 16 had itself been highly selective, drawn from a list of important LIS journals rated by American Library Association-accredited education program deans and Association of Research Libraries member library directors (Nisonger & Davis, 2005). Three journals were discarded from the previous list of 16: Annual
How important is software for LIS research?
Among the 572 LIS journal articles we surveyed, 153 (27%) explicitly mentioned and used software. Compared to the reported proportion of articles mentioning software (65%) in a previous study on 90 biology papers (Howison & Bullard, 2016), the proportion of articles using software in the field of LIS is small. It should be noted that articles mentioning but not actually using software were not taken into account in our study; this might be one reason for the smaller proportion. By year, the use
Discussion and conclusion
This study examines the importance of software to LIS research as well as the use and citation of software that is freely available for academic use in the scientific literature. Moreover, this article explores the degree to which software citation instructions are promulgated and followed. We first selected a sample of 572 articles from the 3950 research articles published in 13 LIS journals in 2008, 2011, 2014, and 2017, then performed content analysis to identify software packages as well as
Author contributions
Xuelian Pan: Conceived and designed the analysis; Collected the data; Contributed data or analysis tools; Performed the analysis; Wrote the paper.
Erjia Yan: Performed the analysis; Wrote the paper.
Ming Cui: Collected the data; Contributed data or analysis tools.
Weina Hua: Conceived and designed the analysis; Collected the data; Performed the analysis.
Acknowledgments
This work was funded by the National Natural Science Foundation of China (Grant No. 71704077). Also, we are grateful to the reviewers for very helpful comments.
References (47)
- et al.
The impact of research grant funding on scientific productivity
Journal of Public Economics
(2011) - et al.
How is R cited in research outputs? Structure, impacts, and citation standard
Journal of Informetrics
(2017) - et al.
Examining the usage, citation, and diffusion patterns of bibliometric mapping software: A comparative study of three tools
Journal of Informetrics
(2018) - et al.
Assessing the impact of software on science: A bootstrapped learning of software entities in full-text papers
Journal of Informetrics
(2015) - et al.
How important is computing technology for library and information science research?
Library & Information Science Research
(2015) Practical statistics for medical research
(1990)- et al.
An introduction to the joint principles for data citation
Bulletin of the American Society for Information Science and Technology
(2015) Publication manual of the American Psychological Association
(2010)Measuring the value of research data: A citation analysis of oceanographic data sets
PLOS ONE
(2014)- et al.
Bibliometric analysis of articles published in ISI dental journals, 2007–2011
Scientometrics
(2014)