Abstract
In this study an attempt is made to establish new bibliometric indicators for the assessment of research in the Humanities. Data from a Dutch Faculty of Humanities was used to provide the investigation a sound empirical basis. For several reasons (particularly related to coverage) the standard citation indicators, developed for the sciences, are unsatisfactory. Target expanded citation analysis and the use of oeuvre (lifetime) citation data, as well as the addition of library holdings and productivity indicators enable a more representative and fair assessment. Given the skew distribution of population data, individual rankings can best be determined based on log transformed data. For group rankings this is less urgent because of the central limit theorem. Lifetime citation data is corrected for professional age by means of exponential regression.
Similar content being viewed by others
References
Archambault, É., et al. (2006). Benchmarking scientific output in the social sciences and humanities: The limits of existing databases. Scientometrics, 68, 329–342.
Burrell, Q. L. (1990). Empirical prediction of library circulations based on negative binomial processes. In L. Egghe & R. Rousseau (Eds.), Informetrics 89/90 (pp. 57–64). Amsterdam: Elsevier Science Publishers.
Butler, L., & Visser, M. S. (2006). Extending citation analysis to non-source items. Scientometrics, 66, 327–343.
Cronin, B., & Snyder, H. (1997). Comparative citation rankings of authors in monographic and journal literature: A study of sociology. Journal of Documentation, 53, 263–273.
Dekking, F. M., Kraaikamp, C., Lopuhaä, H. P., & Meester, L. E. (2005). A modern introduction to probability and statistics: Understanding why and how. London: Springer.
de Solla Price, D. J. (1970). Citation measures of hard science, soft science, technology, and nonscience. In C. E. Nelson & D. K. Pollock (Eds.), Communication among scientists and engineers (pp. 3–22). Lexington: Heath.
Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. arXiv:physics/0508025
Larivière, V., et al. (2006). The place of serials in referencing practices: Comparing natural sciences and engineering with social sciences and humanities. Journal of the American Society for Information Science and Technology, 57, 997–1004.
Leydesdorff, L., & Bensman, S. (2006). Classification and powerlaws: The logarithmic transformation. Journal of the American Society for Information Science and Technology, 57, 1470–1486.
Lindholm-Romantschuk, Y., & Warner, J. (1996). The role of monographs in scholarly communication: An empirical study of Philosophy, Sociology and Economics. Journal of Documentation, 52, 389–404.
Line, M. B. (1979). The influence of the type of sources used on the results of citation analyses. Journal of Documentation, 35, 265–284.
Linmans, A. J. M. (2008). Wetenschappelijk onderzoek in de Faculteit der Letteren van de Universiteit Leiden: Een onderzoek naar bibliometrische indicatoren voor het bepalen van impact van wetenschappelijk onderzoek in de geesteswetenschappen. Leiden: CWTS.
Luwel, M., Moed, H. F., Nederhof, A. J., et al. (1999). Towards indicators of research performance in the social sciences and humanities: An exploratory study in the fields of Law and Linguistics at Flemish Universities. Brussel: Vlaamse Interuniversitaire Raad.
Moed, H. F. (2005). Citation analysis in research evaluation. Dordrecht: Springer.
Nederhof, A. J. (2006). Bibliometric monitoring of research performance in the Social Sciences and the Humanities: A review. Scientometrics, 66, 81–100.
Snow, C. P. (1959/1964). The two cultures: And a second look. Cambridge: Cambridge University Press.
Torres-Salinas, D., & Moed, H. F. (2009). Library catalog analysis as a tool in studies of social sciences and humanities: An exploratory study of published book titles in Economics. Journal of Informetrics, 3, 9–26.
van Leeuwen, Th. N., Moed, H. F., Tijssen, R. J. W., Visser, M. S., & van Raan, A. F. J. (2001). Language biases in the coverage of the Science Citation Index and its consequences for international comparisons of national research performance. Scientometrics, 51, 335–346.
van Raan, A. F. J. (2006). Statistical properties of bibliometric indicators: Research group indicator distributions and correlations. Journal of the American Society for Information Science and Technology, 57, 408–430.
Acknowledgments
I wish to express my gratitude to the Executive Board of Leiden University, and especially its former Vice-Rector magnificus Professor Ton van Haaften, for the opportunity given to carry out this study. I am indebted to Professors Geert Booij and Wim van der Doel, Deans of the Leiden Faculty of Humanities and their staff, and Piet van Slooten, Director of Academic Affairs at Leiden University, and his staff for their encouragement and support. The project would not have been possible without Professor Anthony van Raan, Director of CWTS, who offered the stimulating environment of his institute and who read the manuscript. Henk Moed, Ton Nederhof, Martijn Visser, and my other colleagues at the CWTS helped me by commenting on parts of the preliminary report and by supplying extra data. I am grateful to Henk Moed for his encouraging me to investigate library catalogues as a bibliometric source. I thank the peer reviewers for their helpful comments.
Author information
Authors and Affiliations
Corresponding author
Appendix: mathematics of PA-correction for citations
Appendix: mathematics of PA-correction for citations
Starting from the data as it is plotted in the scatter diagram of citation rates per author over PA (Fig. 2, main text), we can PA-correct the citation rates by using the exponential regression:
where ce(y k ) denotes the exponentially corrected citation score of author y with professional age k. The lower bound of all points ce(y k ) is 0, and their GM is \( {{\bar{y}_{\text{GM}} } \mathord{\left/ {\vphantom {{\bar{y}_{\text{GM}} } {\alpha e^{{\beta \bar{x}}} }}} \right. \kern-\nulldelimiterspace} {\alpha e^{{\beta \bar{x}}} }} = 1. \) The Cartesian point \( \left( {\bar{x},\alpha e^{{\beta \bar{x}}} } \right) = (\bar{x},\bar{y}_{GM} ) \) we call the geometric centre of gravity.
Alternatively, PA-correction can be based on the linear regression (cl standing for linear correction):
The arithmetic mean of all scores cl(y k ) is \( \bar{y} - (\bar{y} + \delta \bar{x}) = 0. \) The centre of gravity in the (x, y)-plane is the Cartesian point \( (\bar{x},\gamma + \delta \bar{x}) = (\bar{x},\bar{y}). \)
Since there are advantages in working with linear data, we may translate (1a) into linear form by taking logs. Because \( \log \left[ {ce(y_{k} )} \right] = cl(\log \,y_{k} ), \) we thus apply the following log transformation of (1a):
The arithmetic mean of all cl(log y k ) is \( \overline{\log \,y} \, - \,(\log \,\alpha \, + \,\beta \bar{x}) = 0. \)
So far we have two linearly corrected scores to work with, (1b) and (2a), which, if we take their origin into account, are fundamentally different in spite of their external similarity, since (1b) is indirectly based on exponential correction of the original counts, while (2a) is directly based on linear correction of the same counts. Hence, both (1a) and (1b) can be seen as representing a model based on exponential correction, while (2a) provides a model based on linear correction.
Finally we standardize scores (1b) and (2a), so that they not only have mean 0, but also standard deviation 1. This is done by applying the standardizing operation \( z = (x - \bar{x})/s, \) where s is the standard deviation of all x. It should be reminded that a different type of standardizing procedure was applied above with regard to (1a). There we calculated the ratio \( x/\bar{x} \) with mean 1 and lower bound 0, without standardizing the deviation.
Consequently, the standardized PA-corrected scores for the exponential and the linear model, corresponding with (1b) and (2a), are respectively:
and
We can add the following standardized scores without PA-correction:
and
Group means (g standing for group) \( \bar{z}_{kg} ,\overline{\log \,z}_{kg} ,\,\bar{z}_{g} ,\overline{\log \,z}_{g} \) (in the main text denoted by:\( \bar{z}_{x} ,\,\overline{\log \,z}_{x} ,\,\bar{z},\,\overline{\log \,z} \)) are obtained by calculating the averages of the individual scores (2b), (1c), (3a) and (3b), respectively of all members of the group in question. It should be noted that the parameters α, β, δ, γ, the means \( \bar{y}, \) and \( \,\overline{{{ \log }\,y}} , \) and the standard deviation s are derived from the reference group. In our case as reference group we always have the union of the groups (samples) for which the group means are computed.
The sampling distributions of the aforementioned group means are normal distributions with μ = 0 (CLT). Should we use, instead, the aforementioned geometric scores of (1a), then the sampling distribution is a lognormal distribution (μ = 1).
Rights and permissions
About this article
Cite this article
Linmans, A.J.M. Why with bibliometrics the Humanities does not need to be the weakest link. Scientometrics 83, 337–354 (2010). https://doi.org/10.1007/s11192-009-0088-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-009-0088-9