Transformations of basic publication–citation matrices

https://doi.org/10.1016/j.joi.2007.01.004Get rights and content

Abstract

Basic publication–citation matrices are used to calculate informetric indicators such as journal impact factors or R-sequences. Transforming these publication–citation matrices clarifies the construction of other indicators. In this article, some transformations are highlighted together with some of their invariants. Such invariants offer a rigorous mathematically founded way of comparing informetric matrices before and after a transformation.

Section snippets

Introduction: the basic publication–citation matrix

A basic publication–citation matrix, in short: pc matrix, is a table showing publication and citation data needed for the calculation of an informetric indicator such as a journal impact factor or an R-sequence (Liang, 2005). Examples of the use of pc matrices can be found in (Frandsen & Rousseau, 2005; Ingwersen, Larsen, Rousseau, & Russell, 2001; Liang, 2005). Data in such matrices are usually shown in chronological order. Publication data may be given either row by row, as we will do in

Generalized impact factors and their relation with rhythm indicators

In Frandsen and Rousseau (2005), a general approach to the notion of an impact factor has been introduced. In this framework, an analysis based on several years of publications becomes possible. In traditional synchronous or diachronous citation studies only one row or one column of the citation submatrix is used. In a synchronous approach the citation period is fixed, in the diachronous approach the publication year (which is the cited year) stays fixed. The more general Frandsen–Rousseau

The R-transformation

This transformation on a pc matrix M is denoted as R. It maps the matrix M to the pc matrix R(M). It leaves the publication column P unchanged: the first column of R(M) is equal to P. R transforms the c-submatrix C with elements Ci,j into the c-submatrix of R(M) with elements Ri,j whereRi,j=Pik=1nj+iPks=1nj+iCs,s+ji=Pis=1nj+iCs,s+jik=1nj+iPk

The first equation establishes the matrix elements Ri,j as a weighted sum of citations, the second one as the number of publications multiplied

The AV-transformation

The more (citable) articles a journal, an institute, etc. publishes the larger its citation potential. In order to take this effect into account one may use the following averaging transformation, denoted as AV. Applied to the matrix M this yields the matrix AV(M) = A. The elements of AV(M) are equal to the elements of M divided by Pi, the first element of the ith row of M:Ai,j=Mi,jPi

In particular, we see that the first column of A consists of ones. The citation part of the AV-transformed matrix

The R¯-transformation

The R¯-transformation is the multiplicative analogue of the R-transformation. It maps the matrix M to the pc matrix R¯(M). R¯ too leaves the publication column P unchanged. Further, R¯ transforms the c-submatrix with elements Ci,j into the c-submatrix R¯(C) with elements R¯i,j whereR¯i,j=k=1nj+i(Ck,k+ji)Pis=1nj+iPs1

The R¯-transformation leaves the product of all elements of the pc matrix elements invariant. Hence, it can be considered as a multiplicative rearrangement of the pc matrix.

Normalizing with respect to the size of the pool

The potential number of retrieved citations clearly depends on the size of the used database (pool). As the pc matrix usually covers several periods it is often a good idea to normalize it with respect to the size of the pool during that period. Let Zj denote the size of the pool in interval Ij, j = 1, …, n. Then the size-normalized pc matrix is denoted as N. With respect to the original case the publication column does not change, but Ci,j is transformed into Ni,j = Ci,j/Zj.

This transformation

Other types of pc matrices

  • A.

    Instead of the matrix C, where Ci,j denotes the total number of citations received in the year j by articles published in a particular journal in the year i, one may also use the simpler matrix T, where Ti,j denotes the total number of articles published in this journal in the year i and cited in the year j (ignoring the precise number of citations each article received, hence Ti,j is equal to Pi minus the number of articles that are uncited in the year j).

  • B.

    Besides a ‘cited’ perspective, it is

pc matrices with discrete steps

In the pc matrices studied thus far rows and columns reflect time periods. Yet, in some studies it is also meaningful to organize the pc matrix by discrete steps, more concretely: each row and column refers to exactly one article or one journal issue, presented in the order in which they are written or published. This approach is especially interesting in self-citation studies, see e.g. (Glänzel, Thijs, & Schlemmer, 2004). We consider the two examples of (1) all articles published by one

Discussion and conclusion

The use of basic publication–citation matrices for the construction of informetric indicators has been highlighted. Transforming these publication–citation matrices clarifies the construction of other indicators. Examples of such transformations, such as the R and the AV-transformation are presented. A distinction has been made between transformations using only elements of the given pc matrix, and transformations using external elements, e.g. the size of the citation pool.

In the spirit of

Acknowledgements

The authors thank Leo Egghe for pointing out that the proof of Theorem 2 can easily be derived form Theorem 1. They also thank the anonymous referees for helpful observations. This work is sponsored by the National Natural Science Foundation of China (Project 70373055).

References (11)

  • R. Rousseau

    Conglomerates as a general framework for informetric research

    Information Processing and Management

    (2005)
  • C.B. Boyer et al.

    A history of mathematics

    (1989)
  • T.F. Frandsen et al.

    Article impact calculated over arbitrary periods

    Journal of the American Society for Information Science and Technology

    (2005)
  • W. Glänzel et al.

    A bibliometric approach to the role of author self-citations in scientific communication

    Scientometrics

    (2004)
  • P. Ingwersen et al.

    The publication–citation matrix and its derived quantities

    Chinese Science Bulletin

    (2001)
There are more references available in the full text version of this article.

Cited by (4)

  • Using extended r-impact to assess journal influence

    2009, IEEE Transactions on Reliability
  • Fundamental properties of rhythm sequences

    2008, Journal of the American Society for Information Science and Technology
View full text