Skip to main content
Log in

Optimal and hierarchical clustering of large-scale hybrid networks for scientific mapping

Scientometrics Aims and scope Submit manuscript

Abstract

Previous studies have shown that hybrid clustering methods based on textual and citation information outperforms clustering methods that use only one of these components. However, former methods focus on the vector space model. In this paper we apply a hybrid clustering method which is based on the graph model to map the Web of Science database in the mirror of the journals covered by the database. Compared with former hybrid clustering strategies, our method is very fast and even achieves better clustering accuracy. In addition, it detects the number of clusters automatically and provides a top-down hierarchical analysis, which fits in with the practical application. We quantitatively and qualitatively asses the added value of such an integrated analysis and we investigate whether the clustering outcome provides an appropriate representation of the field structure by comparing with a text-only or citation-only clustering and with another hybrid method based on linear combination of distance matrices. Our dataset consists of about 8,000 journals published in the period 2002–2006. The cognitive analysis, including the ranked journals, term annotation and the visualization of cluster structure demonstrates the efficiency of our strategy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. http://sites.google.com/site/findcommunities/

References

  • Baeza-Yates, R. A., & Ribeiro-Neto, B. (1999). Modern information retrieval. Boston, MA: Addison-Wesley Longman Publishing Co., Inc.

    Google Scholar 

  • Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10,008.

    Article  Google Scholar 

  • Braam, R. R., Moed, H. F., & van Raan, A. F. J. (1991a). Mapping of science by combined co-citation and word analysis, Part I: Structural aspects. Journal of the American Society for Information Science, 42(4), 233–251.

    Article  Google Scholar 

  • Braam, R. R., Moed, H. F., & van Raan, A. F. J. (1991b). Mapping of science by combined co-citation and word analysis, Part II: Dynamical aspects. Journal of the American Society for Information Science, 42(4), 252–266.

    Article  Google Scholar 

  • Calado, P., Ribeiro-Neto, B., Ziviani, N., Moura, E., & Silva, I. (2003). Local versus global link information in the web. ACM Transactions on Information Systems, 21, 42–63.

    Article  Google Scholar 

  • Calado, P., Cristo, M., Gonçalves, M. A., de Moura, E. S., Ribeiro-Neto, B., & Ziviani, N. (2006). Link-based similarity measures for the classification of web documents. Journal of the American Society for Information Science and Technology, 57, 208–221.

    Article  Google Scholar 

  • Clauset, A., Newman, M. E. J., & Moore, C. (2004). Finding community structure in very large networks. Physical Review E, 70(6), 066111.

    Article  Google Scholar 

  • Fortunato, S. (2010). Community detection in graphs. Physics Reports, 486, 75–174.

    Article  MathSciNet  Google Scholar 

  • Glenisson, P., Glänzel, W., Janssens, F., & De Moor, B. (2005). Combining full text and bibliometric information in mapping scientific disciplines. Information Process Management, 41, 1548–1572.

    Article  Google Scholar 

  • Hatcher, E., & Gospodnetić, O. (2004). Lucene in action. Greenwich, CT: Manning Publications Co.

    Google Scholar 

  • He, X., Zha, H., Ding, C., & Simon, H. (2002). Web document clustering using hyperlink structures. Computational Statistics and Data Analysis, 41(1), 19–45.

    Article  MathSciNet  MATH  Google Scholar 

  • Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.

    Article  Google Scholar 

  • Jaccard, P. (1901). Étude comparative de la distribution florale dans une portion des alpes et des jura. Bulletin de la Société Vaudoise des Sciences Naturelles, 37, 547–579.

    Google Scholar 

  • Jain, A. K. (2010). Data clustering: 50 Years beyond k-means. Pattern Recognition Letters, 31(8), 651–666.

    Article  Google Scholar 

  • Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Englewood Cliffs, NJ: Prentice Hall.

    MATH  Google Scholar 

  • Janssens, F., Leta, J., Glänzel, W., & De Moor, B. (2006a). Towards mapping library and information science. Information Processing Management, 42, 1614–1642.

    Article  Google Scholar 

  • Janssens, F., Tran Quoc, V., Glänzel, W., & De Moor, B. (2006b). Integration of textual content and link information for accurate clustering of science fields. In Proceedings of the I international conference on multidisciplinary information sciences and technologies, InSciT2006 (pp 615–619).

  • Janssens, F., Glänzel, W., & De Moor, B. (2008). A hybrid mapping of information science. Scientometrics, 75(3), 607–631.

    Article  Google Scholar 

  • Janssens, F., Zhang, L., De Moor, B., & Glänzel, W. (2009). Hybrid clustering for validation and improvement of subject-classification schemes. Information Processing and Management, 45(6), 683–702.

    Article  Google Scholar 

  • Joachims, T., Cristianini, N., & Shawe-Taylor, J. (2001). Composite kernels for hypertext categorisation. In Proceedings of the eighteenth international conference on machine learning, ICML’01 (pp 250–257).

  • Krings, G., Calabrese, F., Ratti, C., & Blondel, V. D. (2009). Urban gravity: A model for inter-city telecommunication flows. Journal of Statistical Mechanics: Theory and Experiment, 2009, L07003.

    Article  Google Scholar 

  • Lambiotte, R., & Panzarasa, P. (2009). Communities, knowledge creation, and information diffusion. Journal of Informetrics, 3(3), 180–190.

    Article  Google Scholar 

  • Leydesdorff, L., & Rafols, I. (2009). A global map of science based on the ISI subject categories. Journal of the American Society for Information Science and Technology, 60, 348–362.

    Article  Google Scholar 

  • Liu, X., Glänzel, W., & De Moor, B. (2011). A hierarchical and optimal clustering of WoS journal database by hybrid information. In E. Noyons, P. Ngulube, & J. Leta (Eds.), Proceedings of ISSI 2011—the 13th international conference on scientometrics and informetrics, Durban, South Africa, pp 485–496.

  • Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4), 395–416.

    Article  MathSciNet  Google Scholar 

  • Modha, D. S., & Spangler, W. S. (2000). Clustering hypertext with applications to web searching. In Proceedings of the 7th ACM on hypertext and hypermedia (pp 143–152). New York, NY: ACM Press.

  • Mullins, N., & Snizek, K. W. O. (1988). The structural analysis of a scientific paper. Handbook of quantitative studies of science and technology (pp 81–105). New York, NY: Elsevier Science.

  • Newman, M. E. J. (2004). Analysis of weighted networks. Physical Review E, 70(5), 056131.

    Article  Google Scholar 

  • Newman, M. E. J. (2006a). Finding community structure in networks using the eigenvectors of matrices. Physical Review E, 74(3), 036104.

    Article  Google Scholar 

  • Newman, M. E. J. (2006b). Modularity and community structure in networks. Proceedings of the National Academy of Sciences of the USA, 103(23), 8577–8582.

    Article  Google Scholar 

  • Porter, M. A., Onnela, J. P., & Mucha, P. J. (2009). Communities in networks. Notices of the American Mathematical Society, 56(9), 1082–1097, 1164–1166.

    Google Scholar 

  • Salton, G., & McGill, M. J. (1986). Introduction to modern information retrieval. New York, NY: McGraw-Hill, Inc.

    Google Scholar 

  • Snizek, K. W. O., & Mullins, N. (1991). Textual and nontextual characteristics of scientific papers: Neglected science indicators. Scientometrics, 20(1), 25–35.

    Article  Google Scholar 

  • Strehl, A., & Ghosh, J. (2002). Cluster ensembles—a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3, 583–617.

    MathSciNet  Google Scholar 

  • Tang, L., Wang, X., & Liu, H. (2010). Community detection in multi-dimensional networks. Technical Report TR10-006. Tempe, AS: School of Computing, Informatics, and Decision Systems Engineering, Arizona State University.

  • Wang, Y., & Kitsuregawa, M. (2002). Evaluating contents-link coupled web page clustering for web search results. In Proceedings of the eleventh international conference on Information and knowledge management, CIKM ’02 (pp 499–506).

  • Zhang, L., Liu, X., Janssens, F., Linag, L., & Glänzel, W. (2010). Subject clustering analysis based on ISI category classification. Journal of Informetrics, 4(2), 185–193.

    Article  Google Scholar 

Download references

Acknowledgments

An extended version of a paper presented at the 13th International Conference on Scientometrics and Informetrics, Durban (South Africa), 4–7 July 2011 (Liu et al. 2011). The work was supported by (i) The joint post-doctoral programme by Credit Reference Center and Financial Research Institute, The People’s Bank of China; (ii) National Natural Science Foundation of China (Grant No. 61105058); (iii) Research Council KUL: ProMeta, GOA Ambiorics, GOA MaNet, Co-EEF/05/006, PFV/10/016 SymBioSys, START 1, Optimization in Engineering (OPTEC), IOF-SCORES4CHEM, several PhD/postdoc and fellow grants; (iv) FWO: G.0302.07 (SVM/Kernel), G.0318.05 (subfunctionalization), G.0553.06 (VitamineD), research communities (ICCoS, ANMMM, MLDM); G.0733.09 (3UTR), G.082409 (EGFR); (v) IWT: PhD Grants, Eureka-Flite+, Silicos; SBO-BioFrame, SBO-MoKa, SBO LeCoPro, SBO Climaqs, SBO POM, TBM-IOTA3, O&O-Dsquare; (vi) IBBT; (vii) Belgian Federal Science Policy Office: IUAP P6/04 (DYSCO, Dynamical systems, control and optimization, 2007–2011); (viii) Flemish Government: Center for R&D Monitoring (ECOOM); (viv) EU-RTD: ERNSI: European Research Network on System Identification; FP7-HEALTH CHeartED; FP7-HD-MPC (INFSO-ICT-223854), COST intelliCIS, FP7-EMBOCON (ICT-248940).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinhai Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, X., Glänzel, W. & De Moor, B. Optimal and hierarchical clustering of large-scale hybrid networks for scientific mapping. Scientometrics 91, 473–493 (2012). https://doi.org/10.1007/s11192-011-0600-x

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-011-0600-x

Keywords

Navigation