Introduction

The present study explores ways in extending the present set of standard bibliometric indicators developed by our institute (CWTS) to items not included in source journals for the Web of Science (WoS).Footnote 1 This is of particular importance for fields of scholarly research in which journals are not the dominant outlet medium for scholarly publications (e.g., Butler and Visser 2006; Cronin et al. 1997; Garfield 1979; Glänzel and Schoepflin 1999; Hicks 1999; Kyvik 2003; Lewison 2001; Lindholm-Romantschuk and Warner 1996; Lisee et al. 2008; Nederhof 1989; Nederhof et al. 1989; Thompson 2002).Footnote 2 Generally, studies of research performance involving citation impact tend to be based on quantitative analysis of scientific articles published in journals and serials processed for the WoS versions of the Science Citation Index and associated citation indices: the Science Citation Index (SCI), the Social Science Citation Index (SSCI), and the Arts and Humanities Citation Index (A&HCI). The WoS covers only journals as source for references. However, references in these ‘WoS journals’ do include, in addition to publications in WoS journals, also ‘non-WoS’ items, mainly non-journal items such as books and chapters, but also contributions to journals not covered by the WoS (‘non-WoS journals’). As we will show, the large majority of non-WoS items concern non-journal items, and the principle aim of this study is to focus on the role of non-journal publications in political science, economics and psychology.

In bibliometric studies, limiting the monitoring of references in WoS journals exclusively to either the publications in the same journal or to those in other WoS journals may offer an incomplete view on scholarly citation impact in (1) fields in which journals are not of prime importance as means of scholarly communication and/or (2) fields in which important journals are covered poorly by WoS. Here, measurement of citation impact may be improved by analyzing citations to important non-WoS items. Also, non-WoS citation impact may be appreciated in fields in which non-WoS items, although not dominant, represent an important part of the output.

A problem with non-WoS items (like books) is that many are not cited at all in papers indexed by WoS. For instance, even though books can be cited frequently, we simply do not know how many books are not cited at all. Therefore, we cannot determine the average number of citations per publication for books, and, more in general, for non-WoS items. For the same reason, it is difficult to compute precise and reliable international citation reference values for non-WoS items (however, see Visser et al. 2004).

Standard CWTS citation indicators come in two main types. A first type compares the citation impact of a research unit with the average citation impact of the journals in which it publishes, or with the average impact of articles in the same field(s) (CWTS indicators CPP/JCSm and CPP/FCSm, respectively, see van Raan 2004). At present, it is hardly possible to construct this type of indicators for non-WoS items. A second type of citation indicators compares the contribution of a research unit to the highly cited items in its field(s), for instance the 10% most highly cited items. Although this approach can not be copied directly to non-WoS items at present, it provides a useful model. The WoS does contain information about cited non-WoS items (namely the references to such non-WoS items in publications in WoS-journals), and it is in principle possible to determine and extract the most highly cited items.

In the study described below, we did not just conduct a simple counting of citations to non-WoS publications (e.g., Nederhof 1989; Butler and Visser 2006), but we attempted to extract the most highly cited non-WoS items in three fields in the social and behavioral sciences: political science, economics, and psychology. These three fields show divergent publication patterns. Non-WoS items dominate in political science, they represent about half of the references in economics, while in psychology WoS journals are the dominant scholarly outlet medium (e.g., Moed 2005; Nederhof 2006; Nederhof 2008). We attempt to study to what extent the differences in publication preference among the three fields give rise to differences in citation patterns of non-WoS items. More specifically, we were interested in determining the importance of various types of non-WoS items in the three fields, and in identifying and extracting the most highly cited non-WoS items in each of the three fields.

In the delimitation of the three fields, the NOWTFootnote 3 classification of fields has been used (Tijssen et al. 2008). Thus, the field of political science unites the WoS (sub)fields (i.e., ‘journal categories’) political science, international relations, and public administration, while the field of economics includes the WoS (sub)fields economics, business, business and finance, agricultural economics and policy, and industrial relations and labor. Finally, the field of psychology includes the WoS (sub)fields multidisciplinary psychology, applied psychology, biological psychology, clinical psychology, developmental psychology, experimental psychology, mathematical psychology, psychoanalytical psychology, and social psychology.

Identification of non-WoS items

Data

To reduce the enormous amount of non-WoS cited items to more manageable proportions, we focused on references in publications that were themselves highly cited, and carried at least one European address. The latter requirement was added as it allowed us to concentrate on references that are important to European high impact research. Also, this may help to neutralize a US bias, if present, in the selection of source journals (cf. Van Leeuwen 2006; Nederhof 2006).

We studied the non-WoS references in the 1997–2003 top-10% most highly cited WoS publications (worldwide) with at least one European address in each of the three fields separately. To determine the top-10% publications in each field, in each of the publication years, citations were counted using a moving fixed 4-year citation window. This means for 1997 publications (articles and reviews only), citations were counted from 1997 to 2000, but not in 2001–2006; for 1998 publications we counted citations from 1998 to 2001, and so on until the last publication 2003 year for which we counted citations from 2003 to 2006).

From the reference lists of this set of highly cited publications, we removed publications in WoS-journals (i.e., the WoS items) published in 1980–2003 (the most recent publication year of a reference in our total set is 2003, as this is the last year for which publications were included). Self-citations were not removed, as the full set of authors and initials for non-WoS references is not available in the WoS database. However, we intended to retrieve only the most frequently cited non-WoS references, where the number of self-citations is likely to be relatively small in comparison to the number of external citations, especially as only self-citations are counted in the top-10% most highly cited WoS publications (worldwide) carrying at least one European address. Furthermore, non-WoS references that were cited at least twice were included in our study. The reference strings used to identify non-WoS references contained no information on the authors. This was done in order to minimize the probability that one publication is falsely identified as two or more separate ones due to small differences in the combination of author names and initials. The reference strings did contain up to 20 positions with information on the title of the publication (in case of articles, only the journal name was given, usually in abbreviated form), including abbreviations of, usually, one to three words. Furthermore, the reference strings contained information concerning year of publication (usually), the volume number (if any), and a page number (if any).

Classification of document types

To classify references in document types, items were manually scrutinized independently by two persons (one broadly-oriented social and behavioral researcher and one librarian, both very experienced and knowledgeable in the three fields). Here, the focus was on reference strings that occurred more than twice. A first classification identified articles in journals and proceedings that are not covered by WoS. Typically, these were references that included both a volume number and a page number. In addition, references containing ‘J(ournal)’ were labeled as journal items even if these did not include both a volume number and a page number, except when such items could be identified otherwise (e.g., as a diary; see also below). This leaves, in first approximation, the non-WoS non-journal items, that constitute the majority of the non-WoS items: books, monographs, chapters, theses, handbooks, manuals, working publications, reports, unpublished items, software, many contributions to proceedings, contributions to newspapers, encyclopedia, and so on. Non-WoS non-journal items tend to have several characteristics in common. They carry no volume number (or a low number), and, mostly, they carry no page number. Non-WoS references that lacked both a volume number and a page number were classified preliminary as non-journal non-WoS items.

In the next step, several refinements were made. First, references that could be classified as (contribution to) a handbook were identified. These often contain both a volume number and a page number, and therefore are liable to being misidentified as journal items. Frequently, they contain just a volume number or just a page number. In addition, items that carried ‘in press’ or contained no year were labeled as a special type because such items easily collect spurious high citation numbers by referring to more than one publication (e.g., ‘Nature, in press’). Furthermore, we identified working papers, (contributions to) proceedings, annual meetings and meetings of professional associations, reports, theses, chapters (not in a handbook), book reviews, software, manuals, unpublished publications, and so on, assisted by, but not exclusively relying on, information contained in the reference strings as processed by Thomson Reuters.

Adding address information

The above-mentioned reference strings contained no information at all on address, (primary and secondary) authors, publisher (if any) and so on. Therefore, it was rather difficult to identify publications, their authors, and their addresses. As discussed above, in this study we focused on items from the period after 1979, because these were present in our database as cited items. Moreover, for purposes of measuring scientific impact, these years are the most interesting generally. Reference strings from 1980 to 2003 with the highest number of citations were labeled with the address of the first author, if this information could be retrieved. To this end, we extensively used library sources, commercial information (e.g., www.Amazon.com) and information on the Internet.

Results

Document types and age of publications

In total about 28,000 non-WoS reference strings (hereafter called references) were collected for analysis. As discussed above, all of these occurred at least twice in the reference lists of the top-10% most frequently cited (WoS) publications in the three fields in 1997–2003 (see Table 1).

Table 1 Number of non-WoS references in the three fields

A first analysis addresses the age of the reference material. The pre-1980 references offer some insight into what are considered the most important ‘pioneering’ documents from the perspective of the authors of the most frequently cited publications in the three fields. Here, we counted each individual reference-string as one, independent from the number of citations to it. Figure 1 shows the percentage of total references that dates prior to 1980 for all document types combined (excepting journals, as our database does not allow us to identify WoS journals prior to 1980), and for each of three main document types. Psychologists have the highest percentage of non-journal references from before 1980: 13% as compared to 12% for economists, and 10.5% for scholars in political science, but differences are slight. The oldest items, as evidenced from the publication year in the reference strings, were Hobbes’ Leviathan (1651) for economists, the 1739 Treatise on Human Nature—by David Hume for psychologists, and The Wealth of Nations (1776) by Adam Smith for political scientists.

Fig. 1
figure 1

Percentage of non-WoS references to main document types dating prior to 1980

Among the pre-1980 reference strings, journal contributions (both WoS and non-WoS; as explained, our database distinguishes these only after 1979) outnumber book titles [here, including books, monographs, contributions to edited books, edited books, but not (contributions to) handbooks] in both psychology (3,149 vs. 1,331) and economics (1,420 vs. 918), but not in political science (204 vs. 473).

As shown in Fig. 1, in each of the three fields, the highest percentage of pre-1980 references occurred for book items [including books, monographs, contributions to edited books, edited books, but not (contributions to) handbooks]: varying between 13 (political science) and 18% (psychology). In contrast (contributions to) handbooks, theses, reports, working publications and other document types were hardly cited if these stemmed from before 1980, with one exception in just one field. In psychology, 30% of the identified manuals (typically related to psychological tests or to diagnosis of mental disorders) dated prior to 1980.

However, as the frequency of references may differ among document types, these figures may be in need of correction. A check among pre-1980 references in psychology revealed that contributions to journals accounted for 58% of the items and 64% of the total citations (items times frequency of occurrence), while books accounted for 19% of the items and 19% of the total citations. Thus, there seems little difference in share of items and share of citations in psychology. Perhaps more relevant to the monitoring of scientific impact, as it tends to center on relatively recent work, is the analysis of non-WoS references dating after 1979.

Figure 2 shows that books represent between 62 (psychology) and 81% (political science) of the non-WoS references dating after 1979. The document type with the next highest frequency in each of the three fields is ‘contributions to journals’. These range between 15 (political science) and 24% (psychology), indicating the good coverage of journals by WoS in these fields.

Fig. 2
figure 2

Books and journals dominant among post-1979 non-WoS references

According to Fig. 3 (contributions to) handbooks represent 6% of the post-1979 references in psychology, and 2% in economics, but less than 1% in political science. Theses (2.6%) figure most often in economics, but are less important in psychology, and even less so in political science. Perhaps remarkably (contributions to) proceedings account for 2% of the references in psychology, but are even less frequent in political science and especially economics. Reports account for 2% in economics, but do hardly occur in both other fields. Identified working papers appear infrequently, even in economics. The percentage of working papers is low, accounting for 0.5% of the references in economics. Finally, manuals account for 0.8% in psychology, but occur hardly ever in both other fields.

Fig. 3
figure 3

Items with small frequencies among post-1979 non-WoS reference. Note: Proc proceedings, working P working paper

We also looked at the most highly cited publications for each document type, independent of the document’s age. This gives an indication of the relative importance of each document type for each field. Table 2 shows that books account for the most highly cited item in each of the fields but psychology. Here, a manual is clearly the most highly cited publication, while a book follows ahead of a journal article.

Table 2 Citation frequency of most highly cited non-WoS items by document type

Contributions to journals also include very highly cited non-WoS items. A third category of items that are cited at least 10 times in each of the three fields is formed by (contributions to) handbooks. These seem most important in economics. Contributions to proceedings are occasionally reasonably well cited in political science and, less clearly so, in psychology, but not in economics. Occasionally, a thesis is reasonably well cited in psychology, but not so in both other fields. Finally, software, frequently for statistical analyses, is cited well only in psychology. Other document types, such as working papers and reports, do not collect large numbers of citations. It could be objected that these figures, as they focus on the most highly cited item, represent outliers. However, the data show the presence of at least several nearly equally highly cited instances for each publication type. For instance, several manuals were highly cited in psychology.

Identification of top-ranked publications

For those items ranked in the top-50 of most frequently cited publications in our sample in each of the three fields, we attempted to identify the first author and his or her institutional address. This was done only for items more recent than 1979. The results are shown in Table 3.

Table 3 Identification of top-ranked post-1979 non-WoS publications

Again, psychology has the smallest percentage of post-1980 publications in its top-50 of most cited publications (34 or 68%), but it is followed closely by economics (70%). In contrast, political science has only 12% older material in its top-50. In all three fields, about 95% of the first authors of the post-1979 publications could be identified, while the percentage of identified addresses was only slightly less, around 94%. Although time-consuming, these seem good rates, especially when it is taken into consideration that the references did include neither first author nor any institutional address.

We also looked at the types of documents (here taken broadly as it might include digital media) making up the post-1979 top-ranked publications. In Table 4 we see that books [here more narrowly defined by excluding edited volumes, chapters, manuals, and (contributions to) handbooks] dominate the post-1979 top 50 rankings in the three fields. This is as expected, as individual books tend to be cited much more frequently (about three times as often) than individual articles (Nederhof 2006).

Table 4 Top-ranked post-1979 non-WoS publications according to document type

In psychology, books seem less dominant than in the other two fields, but this is due to manuals, mostly dealing with diagnosis of mental illnesses [two Diagnostic and Statistical Manual of Mental Disorders (DSM) editions] and, in particular, scales which attempt to measure psychological traits. Here, a manual on CD-ROM is included, concerning the linguistic CELEX database. One neurological atlas of brain maps has been classified here provisionally as book rather than manual. Manuals refer to or even constitute some of the tools in psychological research. They are by no means as prominent in both political science and economics. Combined, manuals and books make up 92% of the top-cited publications in psychology. Moreover, psychology is the only field in which a chapter has been ranked among the recent top-50 publications. In addition, edited volumes make up 5% of the total in political science, while these are not ranked in both economics and psychology. Highly cited contributions to handbooks were found only in economics (3%). Contributions to journals make up 3 (psychology, economics) to 7% (political science), a modest, but not unexpected score. Finally, in each field, a few items could not be identified with sufficient certainty given the limited information and time available to us (3–6%).

Table 5 shows that even among top-cited publications with at least one address of a European country, the most cited publications tend to originate from the US for at least 50% (psychology, political science) up to 71% (economics), as evident from the address of the first author. Of course, some of the European top-10% publications (the source publications of our study) will have had US co-authors, but it will probably be less often than among a representative sample of WoS publications in the three fields. Thus, the findings provide some evidence for a US dominance in the three social and behavioral sciences, although most clearly so in economics.

Table 5 Top-ranked non-WoS publications according to country of origin

This notwithstanding, European countries are also well represented with shares of 33 (psychology) and 38% (political science) of the first authors, but to a lesser extent (18%) in economics. One European country, the UK, singly accounts for 9% of the first authors in economics, for 12% in psychology, and even for 20% in political science. However, first authors from the European continent (all from Western Europe) are evidenced most often in psychology (21%), where they outnumber the UK authors. For political science (18%) and economics, European authors are almost evenly divided among the UK and the European continent. The relative importance of countries outside the US and Europe is the largest in psychology (12%), but less so in both economics (6%) and political science (5%).

The top-ranked publications tended to be rather old at the first blush, particularly in psychology and economics. The publication years of the top-ranked publications vary between 1980–1997 (economics), 1982–1997 (psychology), and 1984–2000 (political science). Partly, this is due to the limitation of citing publications to 1997–2003, which makes it less likely that recent publications are cited highly.

Some examples of highly cited publications in the three fields, with the address of the first author, are given in Table 6. As elsewhere in this paper, citation totals refer to those given by the source publications of our study, the 1997–2003 top-10% most frequently cited publications in the three fields. In the field with the largest output, psychology, the citation impact of top-ranked publications is considerably higher than in both other fields. However, the decline in citation impact is steep among psychology publications: the citation impact of the 16th ranked publication in psychology is equal to or just slightly higher than that of publications in both economics and political science.

Table 6 Instances of top-cited publications in three fields

Discussion and conclusions

In principle, it has been shown that it is possible to identify top-cited publications other than WoS publications, particularly non-journal publications, within fields in the social and behavioral sciences. Particularly in political science and psychology, European authors occur as first author on up to nearly 40% of the most frequently cited non-WoS publications. The yield of European authors may be even larger when also secondary authors are identified and located.

The present preliminary analyses yielded interesting insights into modern citing behavior and the importance of various types of documents in the three fields. For example, there is evidence that political science cites more recent publications than either psychology or economics. This is remarkable, as political science is characterized by a relatively high degree of publishing in book format, and, in general, these publications tend to be cited more slowly than journal articles (e.g., Nederhof 2006). Indeed, it was found that book items represent between 62 (psychology) and 81% (political science) of the non-WoS references, but journal articles accounted for no more than 15% (political science) up to 24% (psychology), reflecting the extensive, but certainly not exhaustive coverage of journals by WoS. Furthermore, the importance of manuals for testing and diagnosis has been shown for psychology. Handbooks were cited most frequently in economics. Books and manuals account for the most highly cited publications. Contributions to proceedings were occasionally reasonably well cited in political science and psychology, but not in economics, partly agreeing with results by Lisee et al. (2008), who found that proceedings have a relatively limited scientific impact. Between 50 (psychology, political science) and 71% (economics) of the top-ranked most cited publications originated from the US versus between 18 (economics) and 38% (psychology) from Europe. This provides evidence for the important role of the US in these fields, even when articles carrying at least one European address are used as a source of references (see also Van Leeuwen 2006).

Necessarily, the present data are incomplete and approximate in nature. Especially, adding the first author and his or her first initial to the reference strings might be helpful in better distinguishing edited volumes and contributions to edited volumes, and publications carrying the same abbreviated title in WoS references. Furthermore, including first author and initials in reference strings might be helpful in retrieving address material both more often and more easily. This might also contribute to a more robust identification, as items with identical titles can be distinguished. Concerning the issue of self-citations, these tend to be less of a problem for highly cited non-journal items than for infrequently cited items, as most authors would not be productive enough to generate high levels of self-citations to a single work, especially as only self-citations are counted that appear in the top-10% most highly cited papers. Also, the present study followed publications for up to 24 years, and self-citations are most likely to occur in the first 3 years after publication (e.g., Schubert et al. 2006). Also, it has been found that the share of self-citations is lower for more frequently cited items than for less frequently cited items (e.g., Garfield 1979). Nevertheless, notwithstanding that it is technically difficult, the elimination of self-citations is an issue that might be addressed in follow-up research, especially if one wishes to extend results to items that are relatively well-cited but not with very high absolute citation frequencies.

Particularly in economics and psychology, the top-ranked publications tended to be relatively old, which might limit the usefulness of the results for bibliometric monitoring. Using a more recent set of publications as a source for the extraction of references might alleviate this problem.

We focused in this study on a specific source of data, namely the non-journal references of those publications in the three fields that belong to the top-10% of the impact distribution for these fields, and that also carried at least one European address. This selection certainly helps considerably in reducing the large number of potentially relevant references, and it focuses on those references that have the special attention of authors of very high impact publications, but it also has drawbacks. These top-10% publications may not represent all topics equally well. In principle, the remaining 90% of the publications may show other preferences in citing. Thus, valuable publications may have been overlooked. The present approach might provide a better guide to what will be cited as the most relevant research in the near future than the complete set of papers in a field.

Nevertheless, a more complete set of data is needed in follow-up studies. A clear strength of the present study is its focus on titles of publications, which prevents that small variations in names and initials of the first author affect citation counts. However, it may be possible to keep this focus, but adding sequentially and in a separate field the names and initials of the first author. Then, in a further processing step, keys of author names and first initials may be used to unify at least some of the material. In principle, it is possible to automate this step wholly or partially. This prevents that different authors are merged that have been assigned by the publication database to the same abbreviated title in the same year. Another major benefit of this procedure would be that the final step of adding addresses to the author(s) of the publication can be executed in a much more efficient and accurate manner, yielding a better and more valid list of top cited non-journal publications.

Although the present study has been limited to three disciplines in the social and behavioral sciences, the approach seems suited for applications in other disciplines in which non-journal document types constitute an important part of the output, such as fields in the humanities, in other social and behavioral sciences, but also in fields such as mathematics, information science, parts of biology, and engineering. In nation-wide monitoring efforts of scientific and scholarly excellence such as the RAE in the UK and ERA in Australia, the inclusion of a broader range of cited document types might contribute to a wider acceptance of bibliometric monitoring.