Abstract
In many real-world situations, individual components of complex systems tend to form groups to interact collectively. The grouping effectuates collective relationships. On the other hand, collective relationshsips stimulate individual components to form groups. To gain clear understanding of the structure and functioning of these systems, it is necessary to identify both group formation and collective relationships at the same time. In this paper, we define the notation of collective group relationships (CGRs) between two sets of individual components and propose a method to discover CGRs from heterogeneous datasets. The method integrates canonical correlation analysis (CCA) with graph mining to find top-k CGRs. Several experimental studies are conducted on both synthetic and real-world datasets to demonstrate the effectiveness and efficiency of the proposed method.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Abello, J., Resende, M.G.C., Sudarsky, S.: Massive Quasi-Clique Detection. In: Rajsbaum, S. (ed.) LATIN 2002. LNCS, vol. 2286, pp. 598–612. Springer, Heidelberg (2002)
Cao, K.A.L., Martin, P.G.P., Granié, C.R., Besse, P.: Sparse canonical methods for biological data integration: Application to a cross-platform study. BMC Bioinformatics 10, 34 (2009)
Chen, X., Liu, H.: An efficient optimization algorithm for structured sparse CCA, with applications to eQTL Mapping. Statistics in Biosciences 4(1), 3–26 (2012)
Chen, J., Bushman, F.D., Lewis, J.D., Wu, G.D., Li, H.: Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis. Biostatistics 14(2), 244–258 (2013)
Chiu, G.S., Westveld, A.H.: A unifying approach for food webs, phylogeny, social networks, and statistics. PNAS 108(38), 15881–15886 (2011)
Danon, L., Díaz-Guilera, A., Duch, J., Arenas, A.: Comparing community structure identification. Journal of Statistical Mechanics: Theory and Experiment P09008 (2005)
Fortunato, S.: Community detection in graphs. Physics Reports 486, 75–174 (2010)
Hotelling, H.: Relations Between Two Sets of Variates. Biometrika 28(3/4), 321–377 (1936)
Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recognition Letters 31(8), 651–666 (2010)
Lee, W., Lee, D., Lee, Y., Pawitan, Y.: Sparse Canonical Covariance Analysis for High-throughput Data. Statistical Applications in Genetics and Molecular Biology 10(1): Article 30 (2011)
Lin, D., Zhang, J., Li, J., Calhoun, V.D., Deng, H.W., Wang, Y.P.: Group sparse canonical correlation analysis for genomic data integration. BMC Bioinformatics 14, 245 (2013)
Liu, G., Wong, L.: Effective Pruning Techniques for Mining Quasi-Cliques. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 33–49. Springer, Heidelberg (2008)
Liu, H., Li, J., Liu, L., Liu, J., Lee, I., Zhao, J.: Exploring Groups from Heterogeneous Data via Sparse Learning. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013, Part I. LNCS (LNAI), vol. 7818, pp. 556–567. Springer, Heidelberg (2013)
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Physical Review E 69(2), 26113 (2004)
Parkhomenko, E., Tritchler, D., Beyene, J.: Sparse Canonical Correlation Analysis with Application to Genomic Data Integration. Statistical Applications in Genetics and Molecular Biology, 8(1), Article 1 (2009)
Søkilde, R., Kaczkowski, B., Podolska, A., Cirera, S., Gorodkin, J., Møller, S., Litman, T.: Global microRNA Analysis of the NCI-60 Cancer Cell Panel. Molecular Cancer Therapeutics 10, 375–384 (2011)
Soneson, C., Lilljebjörn, H., Fioretos, T., Fontes, M.: Integrative analysis of gene expression and copy number alterations using canonical correlation analysis. BMC Bioinformatics 11, 191 (2010)
Smyth, G.K.: Limma: linear models for microarray data. Statistics for Biology and Health. Bioinformatics and Computational Biology Solutions using R and Bioconductor. pp. 397-420. Springer (2005)
Tang, L., Liu, H., Zhang, J., Nazeri, Z.: Community Evolution in Dynamic Multi-Mode Networks. In: 14th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), Las Vegas, USA, pp. 677–685 (2008)
Waaijenborg, S., Zwinderman, A.H.: Sparse canonical correlation analysis for identifying, connecting and completing gene-expression networks. BMC Bioinformatics 10, 315 (2009)
Wagner, G.P., Pavlicev, M., Cheverud, J.M.: The Road to Modularity. Nature Reviews Genetics 8(12), 921–931 (2007)
Witten, D., Tibshirani, R., Hastie, T.: A Penalized Matrix Decomposition, with Applications to Sparse Principal Components and Canonical Correlation Analysis. Biostatistics 10(3), 515–534 (2009)
Yan, J.J., Zheng, W., Zhou, X., Zhao, Z.: Sparse 2-D canonical correlation analysis via low rank matrix approximation for feature extraction. IEEE Signal Process Letters 19(1), 51–54 (2012)
Yeung, K.Y., Medvedovic, M., Bumgarner, R.E.: From co-expression to co-regulation: how many microarray experiments do we need? Genome Biology, 5(7), Article R48 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Karim, S.M.M., Liu, L., Li, J. (2014). Discovering Collective Group Relationships. In: Wang, H., Sharaf, M.A. (eds) Databases Theory and Applications. ADC 2014. Lecture Notes in Computer Science, vol 8506. Springer, Cham. https://doi.org/10.1007/978-3-319-08608-8_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-08608-8_10
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08607-1
Online ISBN: 978-3-319-08608-8
eBook Packages: Computer ScienceComputer Science (R0)