Skip to main content
Log in

Simultaneous classification and community detection on heterogeneous network data

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Previous studies on network mining have focused primarily on learning a single task (such as classification or community detection) on a given network. This paper considers the problem of multi-task learning on heterogeneous network data. Specifically, we present a novel framework that enables one to perform classification on one network and community detection in another related network. Multi-task learning is accomplished by introducing a joint objective function that must be optimized to ensure the classes in one network are consistent with the link structure, nodal attributes, as well as the communities detected in another network. We provide both theoretical and empirical analysis of the framework. We also show that the framework can be extended to incorporate prior information about the correspondences between the clusters and classes in different networks. Experiments performed on both real-world and synthetic data sets demonstrate the effectiveness of the joint framework compared to applying classification and community detection algorithms on each network separately.

This is a preview of subscription content, log in via an institution to check access.

Access this article

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Similar content being viewed by others

References

  • Banerjee S, Ramanathan K, Gupta A (2007) Clustering short texts using Wikipedia. In: SIGIR, pp 787–788

  • Bollobás B (1998) Modern graph theory, graduate text in mathematics. Springer, New York

    Book  Google Scholar 

  • Cai D, Shao Z, He X, Yan X, Han J (2005) Community mining from multi-relational networks. In: Proceedings of the 9th European conference on principles and practice of knowledge discovery in databases

  • Caruana R (1997) Multitask learning. Mach Learn 28: 41–75

    Article  Google Scholar 

  • Chen F, Tan PN, Jain AK (2009) A co-classification framework for detecting web spam and spammers in social media web sites. In: CIKM

  • Dhillon IS, Mallela S, Modha DS (2003) Information-theoretic co-clustering. In: Proceedings of the 9th ACM SIGKDD int’l conf on knowledge discovery and data mining. ACM Press, New York, pp 89–98

  • Ding C, Li T, Peng W, Park H (2006) Orthogonal nonnegative matrix tri-factorizations for clustering. In: Proceedings of the 12th ACM SIGKDD. ACM Press, New York, pp 126–135

  • Evgeniou T, Micchelli CA, Pontil M (2005) Learning multiple tasks with kernel methods. J Mach Learn Res 6: 615–637

    MATH  MathSciNet  Google Scholar 

  • Flake GW, Lawrence S, Giles CL (2000) Efficient identification of web communities. In: Sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, pp 150–160

  • Flake GW, Lawrence S, Giles CL, Coetzee FM (2002) Self-organization and identification of web communities. IEEE Comput 35: 66–71

    Article  Google Scholar 

  • Ford L, Fulkerson D (1956) Maximal flow through a network. Can J Math 8: 399–404

    Article  MATH  MathSciNet  Google Scholar 

  • Getoor L, Diehl CP (2005) Link mining: a survey. SIGKDD Explor Newsl 7: 3–12

    Article  Google Scholar 

  • Grineva M, Grinev M, Turdakov D, Velikhov P (2008) Harnessing Wikipedia for smart tags clustering. In: Proceedings of the international workshop on knowledge acquisition from the social web (KASW2008)

  • Ho N-D (2008) Nonnegative matrix factorization algorithms and applications. PhD Thesis, Catholic University of Louvain, June 2008

  • Hu X, Zhang X, Lu C, Park EK, Zhou X (2009) Exploiting Wikipedia as external knowledge for document clustering. In: Proceedings of the international conference on knowledge discovery and data mining (KDD)

  • Kato T, Kashima H, Sugiyama M (2009) Robust label propagation on multiple networks. IEEE Trans Neural Netw 20(1): 35–44

    Article  Google Scholar 

  • Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. IEEE Comput 42(8): 30–37

    Article  Google Scholar 

  • Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. In: NIPS, vol 13, pp 556–562

  • Lin C-J (2007) On the convergence of multiplicative update algorithms for nonnegative matrix factorization. IEEE Trans Neural Netw 18(6): 1589–1596

    Article  Google Scholar 

  • Lin Y-R, Sun J, Castro P, Konuru R, Sundaram H, Kelliher A (2009) MetaFac: community discovery via relational hypergraph factorization. In: Proceedings of the 15th ACM SIGKDD int’l conf on knowledge discovery and data mining, pp 527–536

  • Long B, Zhang ZM, Yu PS (2005) Co-clustering by block value decomposition. In: Proceedings of the 11th ACM SIGKDD int’l conf on knowledge discovery and data mining. ACM Press, New York, pp 635–640

  • Long B, Zhang ZM, Yu PS (2007) A probabilistic framework for relational clustering. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp 470–479

  • Long B, Yu PS, Zhang Z (2008) A general model for multiple view unsupervised learning. In: Proceedings of the 2008 SIAM international conference on data mining

  • Mandayam Comar P, Tan PN, Jain AK (2010a) Multi task learning on multiple related networks. In: Proceedings of the 19th ACM international conference on information and knowledge management, CIKM ’10, pp 1737–1740

  • Mandayam Comar P, Tan P-N, Jain AK (2010b) Identifying cohesive subgroups and their correspondences in multiple related networks. In: Proceedings of the 2010 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology, vol 01, pp 476–483

  • Mandayam Comar P, Tan P-N, Jain AK (2012) A framework for joint community detection across multiple related networks. Neurocomputing 76(1):93–104

    Google Scholar 

  • Newman M (2006) Modularity and community structure in networks. Proc Natl Acad Sci USA 103: 8577–8582

    Article  Google Scholar 

  • Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113

    Google Scholar 

  • Senator TE (2005) Link mining applications: progress and challenges. SIGKDD Explor Newsl 7: 76–83

    Article  Google Scholar 

  • Shi J, Malik J (1997) Normalized cuts and image segmentation. In: Proceedings of CVPR

  • Tang L, Wang X, Liu H (2009a) Uncovering groups via heterogeneous interaction analysis. In: IEEE international conference on data mining (ICDM ’09)

  • Tang W, Lu Z, Dhillon I (2009b) Clustering with multiple graphs. In: ICDM, Miami, pp 1016–1021

  • Wang P, Domeniconi C (2008) Building semantic kernels for text classification using Wikipedia. In: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’08

  • Wang P, Domeniconi C, Hu J (2008) Using Wikipedia for co-clustering based cross-domain text classification. In: Proceedings of the 2008 eighth IEEE international conference on data mining, pp 1085–1090

  • Wei Y-C, Cheng C-K (1989) Towards efficient hierarchical designs by ratio cut partitioning. In: IEEE International Conference on Computer-Aided Design, 1989. ICCAD-89, pp 298–301. doi:10.1109/ICCAD.1989.76957

  • Xue Y, Liao X, Carin L, Krishnapuram B (2007) Multi-task learning for classification with Dirichlet process priors. J Mach Learn Res 8: 35–63

    MATH  MathSciNet  Google Scholar 

  • Yang T, Jin R, Chi Y, Zhu S (2009) Combining link and content for community detection: a discriminative approach. In: Proceedings of the 15th ACM int’l conf on knowledge discovery and data mining, pp 927–936

  • Zhou D, Bousquet O, Lal TN, Weston J, Schölkopf B (2004) Learning with local and global consistency. In: Advances in Neural Information Processing Systems 16, MIT Press, pp 321–328

  • Zhou D, Zhu S, Yu K, Song X, Tseng BL, Zha H, Giles CL (2008) Learning multiple graphs for document recommendations. In: WWW ’08, pp 141–150

  • Zhu X, Ghahramani Z (2002) Learning from labeled and unlabeled data with label propagation. CMU

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Prakash Mandayam Comar.

Additional information

Responsible editor: Fei Wang, Hanghang Tong, Phillip Yu, Charu Aggarwal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Comar, P.M., Tan, PN. & Jain, A.K. Simultaneous classification and community detection on heterogeneous network data. Data Min Knowl Disc 25, 420–449 (2012). https://doi.org/10.1007/s10618-012-0260-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-012-0260-3

Keywords

Navigation