Elsevier

Journal of Informetrics

Volume 4, Issue 2, April 2010, Pages 185-193
Journal of Informetrics

Subject clustering analysis based on ISI category classification

https://doi.org/10.1016/j.joi.2009.11.005Get rights and content

Abstract

The study focuses on the analysis of the information flow among the ISI subject categories and aims at finding an appropriate field structure of the Web of Science using the subject clustering algorithm developed in previous studies. The clustering journals and ISI subject categories provide two subject classification schemes through different perspectives and levels. The two clustering results have been compared and their accordance and divergence have been analyzed. Several indicators have been used to compare the communication characteristics among different ISI subject categories. The neighbour map of each category clearly reflects the affinities between the “core” category and its satellites around.

Introduction

A series of previous studies focused on the analyses of journal clustering based on a complete journal–journal cross-citation matrix (Janssens et al., 2009, Zhang et al., 2009a, Zhang et al., 2009b). ISI, now part of Thomson Scientific, has assigned each journal included to one or more subject categories. Based on this classification scheme, the journal–journal matrix can be aggregated to a category–category matrix, which is much more densely populated than that on the journal level. The present study will focus on the analysis of the information flow among the ISI subject categories. This will be done for two important reasons. This exercise aims at finding an appropriate field structure of the Web of Science using the subject clustering algorithm developed in previous studies. Furthermore, since ISI subject categories are based on journal assignment the question arises of what changes if journal cross-citation is replaced by subject cross-citation. If changes are not essential, the elaborate clustering of more than 8000 journals could be substituted by a somewhat easier analysis of roughly 250 ISI categories and the journal level could, as it were, be skipped. However, we stress that cross-citations are calculated from individual paper-to-paper links whatever aggregation levels are chosen. The other reason is to analyze whether multiple journal assignment to subject categories interferes with, distorts or even determines the resulting cluster structure. Before we introduce the methodological rudiments, we briefly summarise the historical background and the outcomes of previous or related studies.

Along with the development of computerised scientometrics, mapping of science plays an important role in the construction and analysis of science structure. For instance, a variety of techniques for analyzing journal–journal citation relationships have been reported in the literature (Doreian and Fararo, 1985, Leydesdorff, 2006, Tijssen et al., 1987). An alternative method of co-citation clustering has been investigated in constructing a World Atlas of Sciences for ISI (Garfield et al., 1975, Leydesdorff, 1987, Small, 1999). Boyack, Klavans, and Börner (2005) applied eight alternative measures of journal similarity to a dataset of 7121 journals covering over one million documents in the combined Science Citation and Social Sciences Citation Indexes, to show a global map of science using the force-directed graph layout tool VxOrd. Chen (2008) proposes an approach to classify scientific networks in terms of aggregated journal-journal citation relations of the ISI Journal Citation Reports using the affinity propagation method. As mentioned in the outset, Zhang, Glänzel, et al. (2009) and Zhang, Janssens, et al. (2009) have also investigated different methods for the analysis and classifications of scientific journals. Besides using journals as the units of analysis, some recent investigations focus on the science structure based on the subject categories. Glänzel and Schubert (2003) designed a new classification scheme of science fields and subfields for scientometric evaluation purposes. Moya-Anegon et al. (2004) proposed a new technique that uses thematic classification as entities of co-citation, and presented an ego-centred network of 222 ISI categories including science and social sciences. Leydesdorff and Rafols (2009) classified the ISI 172 science categories into 14 groups based on factor analysis, and compared the interdisciplinarity of each category using betweenness centrality. Compared to other researchers, we applied a new clustering technique to classify the ISI science and social sciences categories into 7 groups based on the category–category cross-citation similarities, and further compared the results with the 7 hybrid clustering solution of 8305 journals in a previous study (Zhang, Janssens, et al., 2009). Furthermore, several indicators have been used to analyze the communication characteristics of different categories.

Section snippets

Data sources and processing

The data have been collected from the Web of Science of Thomson-Reuters. Altogether 9487 journals which were assigned to the 246 categories of sciences, social sciences and arts and humanities in the entire period of 2002–2006 were selected and only three document types, namely, article, letter and review, were taken into consideration. More than six million papers were indexed and citations have been summed up through a variable citation window, from the publication year till 2006.

Methods

As already mentioned at the outset, citation links are determined on the basis of paper-by-paper assignment, which provides us several advantages compared to other approaches (Zhang, Glänzel, et al., 2009). There are three procedures for the cross-citation data aggregations: document-to-document, then journal-to-journal, and finally ISI category-to-category (or large domains-to-domains). The previous work focussed on the journal level, and now our focus turns to the higher level of ISI subject

Results

The number of subject assignment in the Web of Science (SCIE, SSCI, AHCI) is 14,608 for 9487 journals during 2002–2006, namely, roughly 1.54 categories per journal. The average number of journals for each category is 59.4. Fig. 1 presents the 15 biggest ISI subject categories, each of which has more than 150 journals.

Among the 9487 journals under study, roughly 60% journals have single assignment for categories in ISI subject classification, and others have multiple assignments. The most

Conclusions and discussions

The Multi-level Aggregation Method generates a balanced clustering result of the ISI subject categories. The components of these clusters clearly distinguish from each other, and each cluster represents one of the scientific domains, respectively. Several indicators have been used to compare the communication characteristics among different ISI subject categories. In general, social sciences categories are inclined to enlarge their link distributions, while science categories tend to have deep

Acknowledgement

The research was supported by Centre for R&D Monitoring of the Flemish Government, the National Natural Science Foundation of China (grant no. 70673019), China Scholarship Council and ERCMAMT (Engineering Research Centre of Metallurgical Automation and Measurement Technology, Ministry of Education, Hubei, China).

References (18)

  • F. Janssens et al.

    Hybrid clustering for validation and improvement of subject-classification schemes

    Information Processing & Management

    (2009)
  • V. Batagelj et al.

    Pajek–analysis and visualization of large networks

    Graph Drawing

    (2002)
  • V.D. Blondel et al.

    Fast unfolding of communities in large networks

    Journal of Statistical Mechanics: Theory and Experiment

    (2008)
  • K.W. Boyack et al.

    Mapping the backbone of science

    Scientometrics

    (2005)
  • C.M. Chen

    Classification of scientific networks using aggregated journal–journal citation relations in the journal citation reports

    Journal of the American Society for Information Science and Technology

    (2008)
  • P. Doreian et al.

    Structural equivalence in a journal network

    Journal of the American Society for Information Science

    (1985)
  • E. Garfield et al.

    A system for automatic classification of scientific literature

    Journal of the Indian Institute of Science

    (1975)
  • W. Glänzel et al.

    A new classification scheme of science fields and subfields designed for scientometric evaluation purposes

    Scientometrics

    (2003)
  • L. Leydesdorff

    Various methods for the mapping of science

    Scientometrics

    (1987)
There are more references available in the full text version of this article.

Cited by (60)

  • Stochastic block model reveals maps of citation patterns and their evolution in time

    2018, Journal of Informetrics
    Citation Excerpt :

    We use journal citation networks from Thomson Reuters Citation Index® for the years ranging from 1900 to 2013 which contains hundreds of millions of citations. Many studies concentrated on small subsets of the citation network (An, Janssen, & Milios, 2004; Grossman, 2002; Nerur et al., 2005; Pieters, Baumgartner, Vermunt, & Bijmolt, 1999; Porter & Rafols, 2009; Shibata et al., 2011; Zhang et al., 2010), while others were interested in large-scale patterns (Boyack et al., 2005; de Moya-Anegón et al., 2007; Leydesdorff & Rafols, 2009). We focus on the large scale citation networks that are constructed using all articles in this bibliographic dataset.

  • Granularity of algorithmically constructed publication-level classifications of research publications: Identification of topics

    2018, Journal of Informetrics
    Citation Excerpt :

    More advanced approaches have been proposed for journal classification in recent decades. These approaches use citation relations between journals for their classification (Archambault, Caruso, & Beauchesne, 2011; Boyack, Klavans, & Börner, 2005; Chen, 2008; Doreian, 1988; Leydesdorff, 1987, 2006; Leydesdorff, Bornmann, & Wagner, 2017; Pudovkin & Garfield, 2002; Rosvall & Bergstrom, 2011; Small & Koenig, 1977; Zhang, Liu, Janssens, Liang, & Glänzel, 2010). The many limits of journal-level classification have been acknowledged in the literature (Archambault et al., 2011).

View all citing articles on Scopus
View full text