Abstract
Previous studies on network mining have focused primarily on learning a single task (such as classification or community detection) on a given network. This paper considers the problem of multi-task learning on heterogeneous network data. Specifically, we present a novel framework that enables one to perform classification on one network and community detection in another related network. Multi-task learning is accomplished by introducing a joint objective function that must be optimized to ensure the classes in one network are consistent with the link structure, nodal attributes, as well as the communities detected in another network. We provide both theoretical and empirical analysis of the framework. We also show that the framework can be extended to incorporate prior information about the correspondences between the clusters and classes in different networks. Experiments performed on both real-world and synthetic data sets demonstrate the effectiveness of the joint framework compared to applying classification and community detection algorithms on each network separately.
Access this article
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
Similar content being viewed by others
References
Banerjee S, Ramanathan K, Gupta A (2007) Clustering short texts using Wikipedia. In: SIGIR, pp 787–788
Bollobás B (1998) Modern graph theory, graduate text in mathematics. Springer, New York
Cai D, Shao Z, He X, Yan X, Han J (2005) Community mining from multi-relational networks. In: Proceedings of the 9th European conference on principles and practice of knowledge discovery in databases
Caruana R (1997) Multitask learning. Mach Learn 28: 41–75
Chen F, Tan PN, Jain AK (2009) A co-classification framework for detecting web spam and spammers in social media web sites. In: CIKM
Dhillon IS, Mallela S, Modha DS (2003) Information-theoretic co-clustering. In: Proceedings of the 9th ACM SIGKDD int’l conf on knowledge discovery and data mining. ACM Press, New York, pp 89–98
Ding C, Li T, Peng W, Park H (2006) Orthogonal nonnegative matrix tri-factorizations for clustering. In: Proceedings of the 12th ACM SIGKDD. ACM Press, New York, pp 126–135
Evgeniou T, Micchelli CA, Pontil M (2005) Learning multiple tasks with kernel methods. J Mach Learn Res 6: 615–637
Flake GW, Lawrence S, Giles CL (2000) Efficient identification of web communities. In: Sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, pp 150–160
Flake GW, Lawrence S, Giles CL, Coetzee FM (2002) Self-organization and identification of web communities. IEEE Comput 35: 66–71
Ford L, Fulkerson D (1956) Maximal flow through a network. Can J Math 8: 399–404
Getoor L, Diehl CP (2005) Link mining: a survey. SIGKDD Explor Newsl 7: 3–12
Grineva M, Grinev M, Turdakov D, Velikhov P (2008) Harnessing Wikipedia for smart tags clustering. In: Proceedings of the international workshop on knowledge acquisition from the social web (KASW2008)
Ho N-D (2008) Nonnegative matrix factorization algorithms and applications. PhD Thesis, Catholic University of Louvain, June 2008
Hu X, Zhang X, Lu C, Park EK, Zhou X (2009) Exploiting Wikipedia as external knowledge for document clustering. In: Proceedings of the international conference on knowledge discovery and data mining (KDD)
Kato T, Kashima H, Sugiyama M (2009) Robust label propagation on multiple networks. IEEE Trans Neural Netw 20(1): 35–44
Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. IEEE Comput 42(8): 30–37
Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. In: NIPS, vol 13, pp 556–562
Lin C-J (2007) On the convergence of multiplicative update algorithms for nonnegative matrix factorization. IEEE Trans Neural Netw 18(6): 1589–1596
Lin Y-R, Sun J, Castro P, Konuru R, Sundaram H, Kelliher A (2009) MetaFac: community discovery via relational hypergraph factorization. In: Proceedings of the 15th ACM SIGKDD int’l conf on knowledge discovery and data mining, pp 527–536
Long B, Zhang ZM, Yu PS (2005) Co-clustering by block value decomposition. In: Proceedings of the 11th ACM SIGKDD int’l conf on knowledge discovery and data mining. ACM Press, New York, pp 635–640
Long B, Zhang ZM, Yu PS (2007) A probabilistic framework for relational clustering. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp 470–479
Long B, Yu PS, Zhang Z (2008) A general model for multiple view unsupervised learning. In: Proceedings of the 2008 SIAM international conference on data mining
Mandayam Comar P, Tan PN, Jain AK (2010a) Multi task learning on multiple related networks. In: Proceedings of the 19th ACM international conference on information and knowledge management, CIKM ’10, pp 1737–1740
Mandayam Comar P, Tan P-N, Jain AK (2010b) Identifying cohesive subgroups and their correspondences in multiple related networks. In: Proceedings of the 2010 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology, vol 01, pp 476–483
Mandayam Comar P, Tan P-N, Jain AK (2012) A framework for joint community detection across multiple related networks. Neurocomputing 76(1):93–104
Newman M (2006) Modularity and community structure in networks. Proc Natl Acad Sci USA 103: 8577–8582
Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113
Senator TE (2005) Link mining applications: progress and challenges. SIGKDD Explor Newsl 7: 76–83
Shi J, Malik J (1997) Normalized cuts and image segmentation. In: Proceedings of CVPR
Tang L, Wang X, Liu H (2009a) Uncovering groups via heterogeneous interaction analysis. In: IEEE international conference on data mining (ICDM ’09)
Tang W, Lu Z, Dhillon I (2009b) Clustering with multiple graphs. In: ICDM, Miami, pp 1016–1021
Wang P, Domeniconi C (2008) Building semantic kernels for text classification using Wikipedia. In: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’08
Wang P, Domeniconi C, Hu J (2008) Using Wikipedia for co-clustering based cross-domain text classification. In: Proceedings of the 2008 eighth IEEE international conference on data mining, pp 1085–1090
Wei Y-C, Cheng C-K (1989) Towards efficient hierarchical designs by ratio cut partitioning. In: IEEE International Conference on Computer-Aided Design, 1989. ICCAD-89, pp 298–301. doi:10.1109/ICCAD.1989.76957
Xue Y, Liao X, Carin L, Krishnapuram B (2007) Multi-task learning for classification with Dirichlet process priors. J Mach Learn Res 8: 35–63
Yang T, Jin R, Chi Y, Zhu S (2009) Combining link and content for community detection: a discriminative approach. In: Proceedings of the 15th ACM int’l conf on knowledge discovery and data mining, pp 927–936
Zhou D, Bousquet O, Lal TN, Weston J, Schölkopf B (2004) Learning with local and global consistency. In: Advances in Neural Information Processing Systems 16, MIT Press, pp 321–328
Zhou D, Zhu S, Yu K, Song X, Tseng BL, Zha H, Giles CL (2008) Learning multiple graphs for document recommendations. In: WWW ’08, pp 141–150
Zhu X, Ghahramani Z (2002) Learning from labeled and unlabeled data with label propagation. CMU
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Fei Wang, Hanghang Tong, Phillip Yu, Charu Aggarwal.
Rights and permissions
About this article
Cite this article
Comar, P.M., Tan, PN. & Jain, A.K. Simultaneous classification and community detection on heterogeneous network data. Data Min Knowl Disc 25, 420–449 (2012). https://doi.org/10.1007/s10618-012-0260-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-012-0260-3