Skip to main content
Log in

Why with bibliometrics the Humanities does not need to be the weakest link

Indicators for research evaluation based on citations, library holdings, and productivity measures

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

In this study an attempt is made to establish new bibliometric indicators for the assessment of research in the Humanities. Data from a Dutch Faculty of Humanities was used to provide the investigation a sound empirical basis. For several reasons (particularly related to coverage) the standard citation indicators, developed for the sciences, are unsatisfactory. Target expanded citation analysis and the use of oeuvre (lifetime) citation data, as well as the addition of library holdings and productivity indicators enable a more representative and fair assessment. Given the skew distribution of population data, individual rankings can best be determined based on log transformed data. For group rankings this is less urgent because of the central limit theorem. Lifetime citation data is corrected for professional age by means of exponential regression.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Archambault, É., et al. (2006). Benchmarking scientific output in the social sciences and humanities: The limits of existing databases. Scientometrics, 68, 329–342.

    Article  Google Scholar 

  • Burrell, Q. L. (1990). Empirical prediction of library circulations based on negative binomial processes. In L. Egghe & R. Rousseau (Eds.), Informetrics 89/90 (pp. 57–64). Amsterdam: Elsevier Science Publishers.

    Google Scholar 

  • Butler, L., & Visser, M. S. (2006). Extending citation analysis to non-source items. Scientometrics, 66, 327–343.

    Article  Google Scholar 

  • Cronin, B., & Snyder, H. (1997). Comparative citation rankings of authors in monographic and journal literature: A study of sociology. Journal of Documentation, 53, 263–273.

    Article  Google Scholar 

  • Dekking, F. M., Kraaikamp, C., Lopuhaä, H. P., & Meester, L. E. (2005). A modern introduction to probability and statistics: Understanding why and how. London: Springer.

    MATH  Google Scholar 

  • de Solla Price, D. J. (1970). Citation measures of hard science, soft science, technology, and nonscience. In C. E. Nelson & D. K. Pollock (Eds.), Communication among scientists and engineers (pp. 3–22). Lexington: Heath.

    Google Scholar 

  • Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. arXiv:physics/0508025

  • Larivière, V., et al. (2006). The place of serials in referencing practices: Comparing natural sciences and engineering with social sciences and humanities. Journal of the American Society for Information Science and Technology, 57, 997–1004.

    Article  Google Scholar 

  • Leydesdorff, L., & Bensman, S. (2006). Classification and powerlaws: The logarithmic transformation. Journal of the American Society for Information Science and Technology, 57, 1470–1486.

    Article  Google Scholar 

  • Lindholm-Romantschuk, Y., & Warner, J. (1996). The role of monographs in scholarly communication: An empirical study of Philosophy, Sociology and Economics. Journal of Documentation, 52, 389–404.

    Article  Google Scholar 

  • Line, M. B. (1979). The influence of the type of sources used on the results of citation analyses. Journal of Documentation, 35, 265–284.

    Article  Google Scholar 

  • Linmans, A. J. M. (2008). Wetenschappelijk onderzoek in de Faculteit der Letteren van de Universiteit Leiden: Een onderzoek naar bibliometrische indicatoren voor het bepalen van impact van wetenschappelijk onderzoek in de geesteswetenschappen. Leiden: CWTS.

    Google Scholar 

  • Luwel, M., Moed, H. F., Nederhof, A. J., et al. (1999). Towards indicators of research performance in the social sciences and humanities: An exploratory study in the fields of Law and Linguistics at Flemish Universities. Brussel: Vlaamse Interuniversitaire Raad.

    Google Scholar 

  • Moed, H. F. (2005). Citation analysis in research evaluation. Dordrecht: Springer.

    Google Scholar 

  • Nederhof, A. J. (2006). Bibliometric monitoring of research performance in the Social Sciences and the Humanities: A review. Scientometrics, 66, 81–100.

    Article  Google Scholar 

  • Snow, C. P. (1959/1964). The two cultures: And a second look. Cambridge: Cambridge University Press.

  • Torres-Salinas, D., & Moed, H. F. (2009). Library catalog analysis as a tool in studies of social sciences and humanities: An exploratory study of published book titles in Economics. Journal of Informetrics, 3, 9–26.

    Article  Google Scholar 

  • van Leeuwen, Th. N., Moed, H. F., Tijssen, R. J. W., Visser, M. S., & van Raan, A. F. J. (2001). Language biases in the coverage of the Science Citation Index and its consequences for international comparisons of national research performance. Scientometrics, 51, 335–346.

    Article  Google Scholar 

  • van Raan, A. F. J. (2006). Statistical properties of bibliometric indicators: Research group indicator distributions and correlations. Journal of the American Society for Information Science and Technology, 57, 408–430.

    Article  Google Scholar 

Download references

Acknowledgments

I wish to express my gratitude to the Executive Board of Leiden University, and especially its former Vice-Rector magnificus Professor Ton van Haaften, for the opportunity given to carry out this study. I am indebted to Professors Geert Booij and Wim van der Doel, Deans of the Leiden Faculty of Humanities and their staff, and Piet van Slooten, Director of Academic Affairs at Leiden University, and his staff for their encouragement and support. The project would not have been possible without Professor Anthony van Raan, Director of CWTS, who offered the stimulating environment of his institute and who read the manuscript. Henk Moed, Ton Nederhof, Martijn Visser, and my other colleagues at the CWTS helped me by commenting on parts of the preliminary report and by supplying extra data. I am grateful to Henk Moed for his encouraging me to investigate library catalogues as a bibliometric source. I thank the peer reviewers for their helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. J. M. Linmans.

Appendix: mathematics of PA-correction for citations

Appendix: mathematics of PA-correction for citations

Starting from the data as it is plotted in the scatter diagram of citation rates per author over PA (Fig. 2, main text), we can PA-correct the citation rates by using the exponential regression:

$$ ce(y_{k} ) = {\frac{{y_{k} }}{{E(y_{k} )}}} = {\frac{{y_{k} }}{{\alpha e^{{\beta x_{k} }} }}}, $$
(1a)

where ce(y k ) denotes the exponentially corrected citation score of author y with professional age k. The lower bound of all points ce(y k ) is 0, and their GM is \( {{\bar{y}_{\text{GM}} } \mathord{\left/ {\vphantom {{\bar{y}_{\text{GM}} } {\alpha e^{{\beta \bar{x}}} }}} \right. \kern-\nulldelimiterspace} {\alpha e^{{\beta \bar{x}}} }} = 1. \) The Cartesian point \( \left( {\bar{x},\alpha e^{{\beta \bar{x}}} } \right) = (\bar{x},\bar{y}_{GM} ) \) we call the geometric centre of gravity.

Alternatively, PA-correction can be based on the linear regression (cl standing for linear correction):

$$ cl(y_{k} ) = y_{k} - E(y_{k} ) = y_{k} - (\gamma + \delta x_{k} ). $$
(2a)

The arithmetic mean of all scores cl(y k ) is \( \bar{y} - (\bar{y} + \delta \bar{x}) = 0. \) The centre of gravity in the (x, y)-plane is the Cartesian point \( (\bar{x},\gamma + \delta \bar{x}) = (\bar{x},\bar{y}). \)

Since there are advantages in working with linear data, we may translate (1a) into linear form by taking logs. Because \( \log \left[ {ce(y_{k} )} \right] = cl(\log \,y_{k} ), \) we thus apply the following log transformation of (1a):

$$ cl(\log \,y_{k} ) = \log \,y_{k} - (\log \,\alpha \, + \,\beta \,x_{k} ). $$
(1b)

The arithmetic mean of all cl(log y k ) is \( \overline{\log \,y} \, - \,(\log \,\alpha \, + \,\beta \bar{x}) = 0. \)

So far we have two linearly corrected scores to work with, (1b) and (2a), which, if we take their origin into account, are fundamentally different in spite of their external similarity, since (1b) is indirectly based on exponential correction of the original counts, while (2a) is directly based on linear correction of the same counts. Hence, both (1a) and (1b) can be seen as representing a model based on exponential correction, while (2a) provides a model based on linear correction.

Finally we standardize scores (1b) and (2a), so that they not only have mean 0, but also standard deviation 1. This is done by applying the standardizing operation \( z = (x - \bar{x})/s, \) where s is the standard deviation of all x. It should be reminded that a different type of standardizing procedure was applied above with regard to (1a). There we calculated the ratio \( x/\bar{x} \) with mean 1 and lower bound 0, without standardizing the deviation.

Consequently, the standardized PA-corrected scores for the exponential and the linear model, corresponding with (1b) and (2a), are respectively:

$$ \log \,z_{k} = {\frac{{\log \,y_{k} - (\log \,\alpha \, + \,\beta x_{k} )}}{s}}, $$
(1c)

and

$$ z_{k} = {\frac{{y_{k} - (\gamma + \delta x_{k} )}}{s}}. $$
(2b)

We can add the following standardized scores without PA-correction:

$$ z = {\frac{{y - \bar{y}}}{s}}, $$
(3a)

and

$$ \log \,z = {\frac{{\log \,y\, - \,\overline{\log \,y} }}{s}}. $$
(3b)

Group means (g standing for group) \( \bar{z}_{kg} ,\overline{\log \,z}_{kg} ,\,\bar{z}_{g} ,\overline{\log \,z}_{g} \) (in the main text denoted by:\( \bar{z}_{x} ,\,\overline{\log \,z}_{x} ,\,\bar{z},\,\overline{\log \,z} \)) are obtained by calculating the averages of the individual scores (2b), (1c), (3a) and (3b), respectively of all members of the group in question. It should be noted that the parameters α, β, δ, γ, the means \( \bar{y}, \) and \( \,\overline{{{ \log }\,y}} , \) and the standard deviation s are derived from the reference group. In our case as reference group we always have the union of the groups (samples) for which the group means are computed.

The sampling distributions of the aforementioned group means are normal distributions with μ = 0 (CLT). Should we use, instead, the aforementioned geometric scores of (1a), then the sampling distribution is a lognormal distribution (μ = 1).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Linmans, A.J.M. Why with bibliometrics the Humanities does not need to be the weakest link. Scientometrics 83, 337–354 (2010). https://doi.org/10.1007/s11192-009-0088-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-009-0088-9

Keywords

Navigation