Skip to main content
Log in

Hybrid clustering of multi-view data via Tucker-2 model and its application

Scientometrics Aims and scope Submit manuscript

Abstract

With the modern technology fast developing, most of entities can be observed by different perspectives. These multiple view information allows us to find a better pattern as long as we integrate them in an appropriate way. So clustering by integrating multi-view representations that describe the same class of entities has become a crucial issue for knowledge discovering. We integrate multi-view data by a tensor model and present a hybrid clustering method based on Tucker-2 model, which can be regarded as an extension of spectral clustering. We apply our hybrid clustering method to scientific publication analysis by integrating citation-link and lexical content. Clustering experiments are conducted on a large-scale journal set retrieved from the Web of Science (WoS) database. Several relevant hybrid clustering methods are cross compared with our method. The analysis of clustering results demonstrate the effectiveness of the proposed algorithm. Furthermore, we provide a cognitive analysis of the clustering results as well as the visualization as a mapping of the journal set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. http://www.esi-topics.com/fields/index.html.

References

  • Arthur, D., & Vassilvitskii, S. (2006). k-means++: The advantages of careful seeding. Technical Report 2006-13, Stanford InfoLab.

  • Batagelj, V., & Mrvar, A. (2003). Pajek—analysis and visualization of large networks. Graph Drawing Software, 2265, 77–103.

    Google Scholar 

  • Bickel, S., & Scheffer, T. (2004). Multi-view clustering. In Proceedings of the Fourth IEEE International Conference on Data Mining (pp. 19–26). IEEE Computer Society, Washington, DC, USA.

  • Boyack, K. W., & Klavans, R. (2010). Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately? Journal of the American Society for Information Science and Technology, 61(12), 2389–2404.

    Article  Google Scholar 

  • Braam, R. R., Moed, H. F., & van Raan, A. F. J. (1991a). Mapping of science by combined co-citation and word analysis, part i: Structural aspects. Journal of the American Society for Information Science, 42(4), 233–251.

    Article  Google Scholar 

  • Braam, R. R., Moed, H. F., & van Raan, A. F. J. (1991b). Mapping of science by combined co-citation and word analysis, part ii: Dynamical aspects. Journal of the American Society for Information Science, 42(4), 252–266.

    Article  Google Scholar 

  • Callon, M., Courtial, J. P., Turner, W. A., & Bauin, S. (1983). From translations to problematic networks: An introduction to co-word analysis. Social Science Information, 22(2), 191–235.

    Article  Google Scholar 

  • Comon, P. (1994). Independent component analysis, a new concept? Signal Processing, 36(3), 287–314.

    Article  MATH  Google Scholar 

  • De Lathauwer, L., De Moor, B., & Vandewalle, J. (2000a). A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications, 21(4), 1253–1278.

    Article  MathSciNet  MATH  Google Scholar 

  • De Lathauwer, L., De Moor, B., & Vandewalle, J. (2000b). On the best rank-1 and rank \((r_1,r_2,\ldots,r_n)\) approximation of higher-order tensors. SIAM Journal on Matrix Analysis and Applications, 21(4), 1324–1342.

    Article  MathSciNet  MATH  Google Scholar 

  • De Lathauwer, L., & Vandewalle, J. (2004). Dimensionality reduction in higher-order signal processing and rank-\((r_1,r_2,\ldots,r_n)\) reduction in multilinear algebra. Linear Algebra and its Applications, 391, 31–55.

    Article  MathSciNet  MATH  Google Scholar 

  • Ding, C., Huang, H., & Luo, D. (2008). Tensor reduction error analysis applications to video compression and classification. In Proceeding of IEEE Conference on Computer Vision and Pattern Recognition (pp. 1–8). Washington, DC: IEEE Computer Society.

  • Dunlavy, D. M., Kolda, T. G., & Kegelmeyer, W. P. (2006). Multilinear algebra for analyzing data with multiple linkages. Tech. Rep. SAND2006-2079, Sandia National Laboratories.

  • Glenisson, P., Glänzel, W., Janssens, F., & De Moor, B. (2005). Combining full text and bibliometric information in mapping scientific disciplines. Information Processing Management, 41(6), 1548–1572.

    Article  Google Scholar 

  • He, X., Zha, H., Ding, C., & Simon, H. (2002). Web document clustering using hyperlink structures. Computational Statistics and Data Analysis, 41(1), 19–45.

    Article  MathSciNet  MATH  Google Scholar 

  • Huang, H., Ding, C., Luo, D., & Li, T. (2008). Simultaneous tensor subspace selection and clustering: The equivalence of high order svd and k-means clustering. In Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 327–335). New York: ACM.

  • Jain, A. K., & Dubes, R. C. (1988). Algorithms for Clustering Data. New York: Prentice Hall.

    MATH  Google Scholar 

  • Janssens, F. (2007). Clustering of scientific fields by integrating text mining and bibliometrics. PhD thesis, Faculty of Engineering, K.U. Leuven, Leuven, Belgium.

  • Janssens, F., Zhang, L., De Moor, B., & Glänzel, W. (2009). Hybrid clustering for validation and improvement of subject-classification schemes. Information Processing and Management, 45(6), 683–702.

    Article  Google Scholar 

  • Joachims, T., Cristianini, N., & Shawe-Taylor, J. (2001). Composite kernels for hypertext categorisation. In ICML ’01: Proceedings of the Eighteenth International Conference on Machine Learning (pp. 250–257). San Francisco, CA: Morgan Kaufmann Publishers Inc.

  • Kolda, T. G., & Bader, W. B. (2006). The TOPHITS model for higher-order web link analysis. In Proceedings of the SIAM Data Mining Conference Workshop on Link Analysis, Counterterrorism and Security.

  • Kolda, T. G., & Bader, B. W. (2009). Tensor decompositions and applications. SIAM Review, 51(3), 455–500.

    Article  MathSciNet  MATH  Google Scholar 

  • Lay, D. C. (2003). Linear Algebra and Its Applications (3rd ed.). Boston: Addition Wesley.

    Google Scholar 

  • Liu, X., Yu, S., Moreau, Y., De Moor, B., Glänzel, W., & Janssens, F. (2009). Hybrid clustering of text mining and bibliometrics applied to journal sets. In Proceedings of the SIAM International Conference on Data Mining. Philadelphia, PA: SIAM.

  • Liu, X., Yu, S., Janssens, F., Glänzel, W., Moreau, Y., & De Moor, B. (2010). Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database. Journal of the American Society for Information Science and Technology, 61(6), 1105–1119.

    Google Scholar 

  • Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4), 395–416.

    Article  MathSciNet  Google Scholar 

  • Modha, D. S., & Spangler, W. S. (2000). Clustering hypertext with applications to web searching. In Proceedings of the 7th ACM on Hypertext and Hypermedia (pp. 143–152). New York: ACM Press.

  • Newman, M. E. J. (2006). Modularity and community structure in networks. PNAS, 103(23), 8577–8582.

    Article  Google Scholar 

  • Ng, A., Jordan, M., & Weiss, Y. (2001). On spectral clustering: Analysis and an algorithm. In T. Dietterich, S. Becker, & Z. Ghahramani (Eds.), Advances in neural information processing systems (pp. 849–856). Cambridge: MIT Press.

  • Phan, A., & Cichocki, A. (2010). Tensor decompositions for feature extraction and classification of high dimensional datasets. Nonlinear Theory and Its Applications, IEICE (in print).

  • Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Computational and Applied Mathematics, 20, 53–65.

    Article  MATH  Google Scholar 

  • Savas, B., & Eldén, L. (2007). Handwritten digit classification using higher order singular value decomposition. Pattern Recognition, 40(3), 993–1003.

    Article  MATH  Google Scholar 

  • Selee, T. M., Kolda, T. G., Kegelmeyer, W. P., & Griffin, J. D. (2007). Extracting clusters from large datasets with multiple similarity measures using IMSCAND. In M. L. Parks & S. S. Collis (Eds.), CSRI Summer Proceedings 2007 (pp. 87–103). Technical Report SAND2007-7977. Albuquerque, NM and Livermore, CA: Sandia National Laboratories.

  • Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.

    Article  Google Scholar 

  • Small, H. (1973). Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24(4), 265–269.

    Article  Google Scholar 

  • Smilde, A., Bro, R., & Geladi, P. (2004). Multi-way analysis: Applications in the chemical sciences. West Sussex, England: Wiley.

    Book  Google Scholar 

  • Strehl, A., & Ghosh, J. (2002). Cluster ensembles-a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3, 583–617.

    Article  MathSciNet  Google Scholar 

  • Sun, J., Tao, D., & Faloutsos, C. (2006). Beyond streams and graphs: dynamic tensor analysis. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 374–383) New York: ACM.

  • Tang, W., Lu, Z., & Dhillon, I. S. (2009). Clustering with multiple graphs. In ICDM ’09: Proceedings of the 2009 Ninth IEEE International Conference on Data Mining (pp. 1016–1021). Washington, DC: IEEE Computer Society.

  • Tucker, L. (1964). The extension of factor analysis to three-dimensional matrices. In H. Gulliksen & N. Frederiksen (Eds.), Contributions to mathematical psychology (pp. 109–127). New York: Holt, Rinehart & Winston.

  • Tucker, L. (1966). Some mathematical notes on three-mode factor analysis. Psychometrika, 31, 279–311.

    Article  MathSciNet  Google Scholar 

  • Yu, S. (2009). Kernel-based data fusion for machine learning: Methods and applications in bioinformatics and text mining. PhD thesis, Faculty of Engineering, K.U. Leuven, Leuven, Belgium.

Download references

Acknowledgements

Thanks for the relevant work with Professor Lieven De Lathauwer at the K.U.Leuven. Bart De Moor is a full professor at the Katholieke Universiteit Leuven, Belgium. Research supported by (1) Engineering Research Center of Metallurgical Automation and Measurement Technology (ERCMAMT), Ministry of Education, 430081, Hubei, China and China Scholarship Council (CSC, No. 2006153005); (2) Research Council KUL: GOA Ambiorics, GOA MaNet, CoE EF/05/006 Optimization in Engineering (OPTEC), IOF-SCORES4CHEM, several PhD/postdoc & fellow grants; (3) FWO: PhD/postdoc grants, projects: G0226.06 (cooperative systems and optimization), G0321.06 (Tensors), G.0302.07 (SVM/Kernel), G.0320.08 (convex MPC), G.0558.08 (Robust MHE), G.0557.08 (Glycemia2), G.0588.09 (Brain-machine) research communities (ICCoS, ANMMM, MLDM); G.0377.09 (Mechatronics MPC); (4) IWT: PhD Grants, Eureka-Flite+, SBO LeCoPro, SBO Climaqs, SBO POM, O&O-Dsquare; (5) Belgian Federal Science Policy Office: IUAP P6/04 (DYSCO, Dynamical systems, control and optimization, 2007–2011); (6) EU: ERNSI; FP7-HD-MPC (INFSO-ICT-223854), COST intelliCIS, FP7-EMBOCON (ICT-248940); (7)Contract Research: AMINAL; Other: Helmholtz: viCERP; ACCM; Bauknecht; Hoerbiger; (8) Flemish Government: Center for R&D Monitoring. (9) Thanks for discussion with Dr. Carlos Alzate in K.U. Leuven. The scientific responsibility is assumed by its authors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinhai Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, X., Glänzel, W. & Moor, B.D. Hybrid clustering of multi-view data via Tucker-2 model and its application. Scientometrics 88, 819–839 (2011). https://doi.org/10.1007/s11192-011-0348-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-011-0348-3

Keywords

Navigation