Multi-type clustering in heterogeneous information networks

Lin, Wangqun; Yu, Philip S.; Zhao, Yuchen; Deng, Bo

doi:10.1007/s10115-015-0869-9

Multi-type clustering in heterogeneous information networks

Regular Paper
Published: 19 October 2015

Volume 48, pages 143–178, (2016)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Wangqun Lin¹,
Philip S. Yu^2,3,
Yuchen Zhao² &
…
Bo Deng¹

533 Accesses
4 Citations
Explore all metrics

Abstract

Heterogeneous information networks have drawn much attention in recent years due to their significant applications, such as text mining, e-commerce, social networks, and bioinformatics. Clustering different types of objects simultaneously based upon not only their relations of the same type, but also the relations between different types of objects can improve the clustering quality mutually. In this paper, we propose a general model, in which both the homogeneous and heterogeneous relations are considered simultaneously, to describe the structure of the heterogeneous information networks and devise a novel parametric free multi-type overlapped clustering approach. In this model, different types of relations between different types of objects are represented by a group of matrices. In this way, we transfer the multi-type clustering problem into the information compression problem. Subsequently, greedy search approaches, which aim at describing the group of relational matrices with least bits, are proposed. Moreover, by discovering the discriminative clusters among different types of objects, we devise effective parameter-free strategies to discover either overlapping or non-overlapping structure among different types of clusters. Extensive experiments on real-world and synthetic data sets demonstrate our methods are effective and efficient.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel probabilistic clustering model for heterogeneous networks

Article 05 February 2016

Heterogeneous Information Networks Bi-clustering with Similarity Regularization

Constrained-meta-path-based ranking in heterogeneous information network

Article 28 January 2016

Notes

http://www.informatik.uni-trier.de/ley/db/.
All logarithms are based on 2 in this paper.
http://www.dataminingresearch.com/index.php/2010/09/classic3-classic4-datasets.
http://people.csail.mit.edu/jrennie/20Newsgroups/.
http://www-users.cs.umn.edu/han/data/.
http://mlg.ucd.ie/datasets.
http://www.cs.uiuc.edu/homes/sun22/data/.
http://arnetminer.org/.

References

Ahn YY, Bagrow JP, Lehmann S (2010) Link communities reveal multiscale complexity in networks. Nature 466:761–764
Article Google Scholar
Banerjee A, Basu S, Merugu S (2007) Multi-way clustering on relation graphs. Proceedings of the 7th SIAM international conference on data mining. SIAM, Minneapolis, MN, USA, pp 145–156
Google Scholar
Banerjee A, Dhillon I, Ghosh J, Merugu S, Modha DS (2007) A generalized maximum entropy approach to Bregman co-clustering and matrix approximation. J Mach Learn Res 8:1919–1986
MathSciNet MATH Google Scholar
Barron A, Rissanen J, Yu B (1998) The minimum description length principle in coding and modeling. IEEE Trans Inf Theory 44(6):2743–2760
Article MathSciNet MATH Google Scholar
Bekkerman R, Mccallum A (2005) Multi-way distributional clustering via pairwise interactions. Proceedings of the 22nd international conference on machine learning. ACM, Bonn, pp 41–48
Google Scholar
Bekkerman R, Jeon J (2007) Multi-modal clustering for multimedia collections. Computer society conference on computer vision and pattern recognition. IEEE Computer Society, Minneapolis, MN, USA, pp 1–8
Google Scholar
Chakrabarti D, Papadimitriou S, Modha DS, Faloutsos C (2004) Fully automatic cross-associations. Proceedings of the 10th international conference on knowledge discovery and data mining. ACM, Seattle, Washington, DC, USA, pp 79–88
Google Scholar
Chen Y, Wang L, Dong M (2010) Non-negative matrix factorization for semisupervised heterogeneous data coclustering. IEEE Trans Knowl Data Eng 22(10):1459–1474
Article Google Scholar
Cheng YZ, Church GM (2000) Biclustering of expression data. International conference on intelligent systems for molecular biology 8:93–103
Google Scholar
Cho H, Dhillon IS, Guan YQ, Sra S (2004) Minimum sum-squared residue co-clustering of gene expression data. Proceedings of the 4th international conference on data mining. SIAM, Lake Buena Vista, FL, USA, pp 114–125
Google Scholar
Collins LM, Dent CM (1998) Omega: a general formulation of the rand index of cluster recovery suitable for non-disjoint solutions. Multivar Behav Res 23(2):231–242
Article Google Scholar
Cook DJ, Holder LB (1994) Substructure discovery using minimum description length and background knowledge. J Artif Intell Res 1:231–255
Google Scholar
Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. Proceedings of the 7th international conference on knowledge discovery and data mining. ACM, San Francisco, CA, USA, pp 269–274
Google Scholar
Dhillon IS, Guan YQ (2003) Information theoretic clustering of sparse co-occurrence data. Proceedings of the 9th international conference on knowledge discovery and data mining. IEEE Computer Society, Melbourne, FL, USA, pp 517–528
Google Scholar
Dhillon IS, Mallela S, Modha DS (2003) Information theoretic co-clustering. Proceedings of the 9th international conference on knowledge discovery and data mining. ACM, Washington DC, pp 89–98
Google Scholar
Gao B, Liu TY, Zheng X, Cheng QS, Ma WY (2005) Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering. Proceedings of the 11th international conference on knowledge discovery and data mining. ACM, New York, NY, USA, pp 41–50
Google Scholar
Gao B, Liu TY, Ma WY (2006) Star-structured high-order heterogeneous data co-clustering based on consistent information theory. 6th international conference on data mining. IEEE Computer Society, Hong Kong, pp 880–884
Google Scholar
Gossen T, Kotzyba M, Nürnberger A (2014) Graph clusterings with overlaps: adapted quality indices and a generation model. Neurocomputing 123:13–22
Article Google Scholar
Gregory S (2009) Finding overlapping communities using disjoint community detection algorithms. In: Results of the 2009 international workshop on complex networks, Catania, pp 47–61
Guimerá R, Amaral LAN (2005) Functional cartography of complex metabolic networks. Nature 433(7028):895–900
Article Google Scholar
Han EH, Karypis G (2000) Centroid-based document classification: analysis and experimental results. Proceedings of the 4th European conference on principles of data mining and knowledge discovery. Springer, Lyon, pp 424–431
Chapter Google Scholar
Havemann F, Heinz M, Struck A, Gläser J (2011) Identification of overlapping communities and their hierarchy by locally calculating community-changing resolution levels. J Stat Mech Theory Exp 01:P01023
Google Scholar
He JR, Tong H, Papadimitriou S, Rad TE, Faloutsos C, Carbonell J (2009) Pack: scalable parameter-free clustering on k-partite graphs. In: SDM workshop on link analysis. SIAM, John Ascuagas Nugget
Hubert L, Arabie P (1985) Comparing partitions. J Classif 1:193–218
Article Google Scholar
Ienco D, Robardet C, Pensa R, Meo R (2013) Parameter-less co-clustering for star-structured heterogeneous data. Data Min Knowl Discov 26(2):217–254
Article MathSciNet MATH Google Scholar
Koutra D, Kang U, Vreeken J, Faloutsos C (2014) VOG: summarizing and understanding large graphs. Proceedings of the 2014 international conference on data mining. SIAM, Philadelphia, PA, USA, pp 91–99
Chapter Google Scholar
Lancichinetti A, Fortunato S, Kertesz J (2009) Detecting the overlapping and hierarchical community structure in complex networks. New J Phys 11(3):033015
Article Google Scholar
Lazzeroni L, Owen A (2000) Plaid models for gene expression data. Stat Sin 12:61–86
MathSciNet MATH Google Scholar
Lin WQ, Zhao YC, Yu PS, Deng B (2014) An effective approach on overlapping structures discovery for co-clustering. 16th Asia-Pacific web conference in web technologies and applications. Springer, Changsha, pp 56–67
Google Scholar
Long B, Zhang ZF, Yu PS (2010) A general framework for relation graph clustering. Knowl Inf Syst 24:393–413
Article Google Scholar
Long B, Wu YX, Zhang ZF, Yu PS (2006) Unsupervised learning on k-partite graphs. Proceedings of the 12th international conference on knowledge discovery and data mining. ACM, New York, NY, USA, pp 317–326
Google Scholar
Long B, Zhang ZF, Wu XY, Yu PS (2006) Spectral clustering for multi-type relational data. Proceedings of the 23rd international conference on machine learning. ACM, Apia, pp 585–592
Google Scholar
Long B, Zhang ZF, Yu PS (2005) Co-clustering by block value decomposition. Proceedings of the 11th international conference on knowledge discovery and data mining. IEEE Computer Society, Binghamton, pp 635–640
Google Scholar
Meo PD, Ferrara E, Fiumara G, Provetti A (2014) Mixing local and global information for community detection in large networks. J Comput Syst Sci 80(1):72–87
Article MathSciNet MATH Google Scholar
Newman M, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113
Article Google Scholar
Palla G, Derenyi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435:814
Article Google Scholar
Papadimitriou S, Gionis A, Tsaparas P, Vaisanen RA, Mannila H, Faloutsos C (2005) Parameter-free spatial data mining using MDL. Proceedings of the 5th international conference on data mining. IEEE Computer Society, Houston, TX, USA, pp 346–353
Papadimitriou S, Sun J, Faloutsos C, Yu PS (2008) Hierarchical, parameter-free community discovery. European conference in machine learning and knowledge discovery in databases. Springer, Antwerp, Belgium, pp 170–187
Chapter Google Scholar
Rosvall M, Bergstrom CT (2007) An information-theoretic framework for resolving community structure in complex networks. Proc Natl Acad Sci USA 104:7327–7331
Article Google Scholar
Sales MP, Guimerà R, Moreira A, Amaral L (2007) Extracting the hierarchical organization of complex systems. Proc Natl Acad Sci 104(39):15224–15229
Article Google Scholar
Shiga M, Takigawa I, Mamitsuka H (2007) A spectral clustering approach to optimally combining numerical vectors with a modular network. Proceedings of the 13th international conference on knowledge discovery and data mining. ACM, San Jose, CA, USA, pp 647–656
Google Scholar
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Article Google Scholar
Sun YS, Yu YT, Han HW (2009) Ranking-based clustering of heterogeneous information networks with star network schema. Proceedings of the 15th international conference on knowledge discovery and data mining. ACM, New York, NY, USA, pp 797–806
Google Scholar
Tian Y, Hankins R, Patel J (2008) Efficient aggregation for graph summarization. Proceedings of the international conference on management of data (SIGMOD 2008). ACM, Vancouver, pp 567–580
Google Scholar
Tsai C, Chiu C (2008) Developing a feature weight self-adjustment mechanism for a k-means clustering algorithm. Comput Stat Data Anal 52:4658–4672
Article MathSciNet MATH Google Scholar
Wakita K, Tsurumi T (2007) Finding community structure in mega-scale social networks. Proceedings of the 16th international conference on world wide web. ACM, Banff, AB, Canada, pp 1275–1276
Chapter Google Scholar
Wang JD, Zeng HJ, Chen Z, Lu HJ, Tao L, Ma WY (2003) Recom:reinforcement clustering of multi-type interrelated data objects. Proceedings of the 26th annual international conference on research and development in information retrieval. ACM, New York, NY, USA, pp 274–281
Google Scholar
Wang XF, Tang L, Gao HJ, Liu H (2010) Discovering overlapping groups in social media. 10th international conference on data mining. IEEE Computer Society, Sydney, pp 569–578
Google Scholar
Xu X, Yuruk N, Feng Z, Schweiger TAJ (2007) Scan: A structural clustering algorithm for networks. Proceedings of the 13th international conference on knowledge discovery and data mining. ACM, San Jose, CA, USA, pp 824–833
Google Scholar

Download references

Acknowledgments

Wangqun Lin and Bo Deng are supported by National Natural Science Foundation of China through Grant 61271252. Philip S. Yu and Yuchen Zhao are supported by NSF through Grant CNS-1115234, Google Research Award, and the Pinnacle Lab at Singapore Management University.

Author information

Authors and Affiliations

Beijing Institute of System Engineering, Beijing, China
Wangqun Lin & Bo Deng
University of Illinois at Chicago, Chicago, IL, USA
Philip S. Yu & Yuchen Zhao
Institute of Data Science, Tsinghua University, Beijing, China
Philip S. Yu

Authors

Wangqun Lin
View author publications
Search author on:PubMed Google Scholar
Philip S. Yu
View author publications
Search author on:PubMed Google Scholar
Yuchen Zhao
View author publications
Search author on:PubMed Google Scholar
Bo Deng
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Wangqun Lin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, W., Yu, P.S., Zhao, Y. et al. Multi-type clustering in heterogeneous information networks. Knowl Inf Syst 48, 143–178 (2016). https://doi.org/10.1007/s10115-015-0869-9

Download citation

Received: 26 August 2014
Revised: 24 June 2015
Accepted: 02 August 2015
Published: 19 October 2015
Issue Date: July 2016
DOI: https://doi.org/10.1007/s10115-015-0869-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-type clustering in heterogeneous information networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A novel probabilistic clustering model for heterogeneous networks

Heterogeneous Information Networks Bi-clustering with Similarity Regularization

Constrained-meta-path-based ranking in heterogeneous information network

Explore related subjects

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now