Abstract
We consider the problem of finding the sparse associations between two sources of data, for example the sparse association between genetic variations (e.g., single nucleotide polymorphisms, SNPs) and phenotypical features (e.g., magnetic resonance imaging, MRI) in the study of Alzheimer’s disease (AD). Despite the success of Canonical Correlation Analysis (CCA) based its sparse variants in a number of applications, they usually neglect the underlying natural tree structures SNPs and MRI data. Specifically, the whole candidate set, genes, SNPs of gene form a path of tree structure in SNPs data, and the whole image, regions of image, features of region form a path of tree structure in the MRI data. In order to model the tree structure of features in both sources of data, in this paper, we propose a Tree-guided Sparse Canonical Correlation Analysis (TSCCA). The proposed model equips CCA with special mixed-norm regularization terms in order to model the underlying multilevel tree structures among both the inputs and outputs. To solve the resulted complicated optimization problem, we introduce an efficient iterative algorithm for TSCCA by rewriting tree-structured regularization into the common form of overlapping group lasso. To evaluate the proposed model, we have designed the simulation study and real world study respectively on Alzheimer’s disease. Experimental results on the simulation study have shown that the proposed method outperforms CCA with Lasso and group Lasso. The real world study on Alzheimer’s disease has shown that our model can find biologically meaningful associations between SNPs and MRI features.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Chen, J., Bushman, F.D., Lewis, J.D., Wu, G.D., Li, H.: Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis. Biostatistics 14(2), 244–258 (2013)
Chen, X., Liu, H., Carbonell, J.G.: Structured sparse canonical correlation analysis. In: International Conference on Artificial Intelligence and Statistics, pp. 199–207 (2012)
Daniela, M., Tibshirani, R.: Extensions of sparse canonical correlation analysis, with applications to genomic data. Stat. Appl. Genet. Mol. Biol. 383(1), 1–27 (2009)
Du, L., et al.: Pattern discovery in brain imaging genetics via scca modeling with a generic non-convex penalty. Sci. Rep. 7(1), 14052 (2017)
Eisenschtat, A., Wolf, L.: Linking image and text with 2-way nets. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Hao, X., et al.: Mining outcome-relevant brain imaging genetic associations via three-way sparse canonical correlation analysis in alzheimer’s disease. Sci. Rep. 7, 44272 (2017)
Hao, X., et al.: Identification of associations between genotypes and longitudinal phenotypes via temporally-constrained group sparse canonical correlation analysis. Bioinformatics 33(14), i341–i349 (2017)
Hotelling, H.: Relations between two sets of variates. Biometrika 28, 321–377 (1936)
Jacob, L., Obozinski, G., Vert, J.P.: Group lasso with overlap and graph lasso. In: Proceedings of the 26th Annual international Conference on Machine Learning, pp. 433–440. ACM (2009)
Jenatton, R., Audibert, J.Y., Bach, F.: Structured variable selection with sparsity-inducing norms. J. Mach. Learn. Res. 12, 2777–2824 (2011)
Kang, Z., Lu, X., Yi, J., Xu, Z.: Self-weighted multiple kernel learning for graph-based clustering and semi-supervised classification. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI), pp. 2312–2318 (2018)
Kim, S., Xing, E.P.: Tree-guided group lasso for multi-task regression with structured sparsity. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 543–550 (2010)
Kim, S., Xing, E.P., et al.: Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eqtl mapping. Ann. Appl. Stat. 6(3), 1095–1117 (2012)
Liu, J., Ye, J.: Moreau-yosida regularization for grouped tree structure learning. In: Advances in Neural Information Processing Systems, pp. 1459–1467 (2010)
MacKay, D.J.: Bayesian interpolation. Neural Comput. 4(3), 415–447 (1991)
Meier, L., Van De Geer, S., Buhlmann, P.: The group lasso for logistic regression. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 70(1), 53–71 (2008)
Neal, R.M.: Bayesian Learning for Neural Networks, vol. 118, p. 118. Springer Science & Business Media, New York (1996)
Parkhomenko, E., Tritchler, D., Beyene, J.: Genome-wide sparse canonical correlation of gene expression with genotypes. BMC Proc. 1(Suppl. 1), S119 (2007)
Parkhomenko, E., Tritchler, D., Beyene, J.: Sparse canonical correlation analysis with application to genomic data integration. Stat. Appl. Genet. Mol. Biol. 8(1), 1–34 (2009)
Que, X., Ren, Y., Zhou, J., Xu, Z.: Regularized multi-source matrix factorization for diagnosis of Alzheimer’s disease. In: Neural Information Processing - 24th International Conference, ICONIP, pp. 463–473 (2017)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58, 267–288 (1994)
Witten, D.M., Tibshirani, R., Hastie, T.: A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics p. kxp008 (2009)
Witten, D.M., Tibshirani, R.J.: Extensions of sparse canonical correlation analysis with applications to genomic data. Stat. Appl. Genet. Mol. Biol. 8(1), 1–27 (2009)
Xu, Z., Jin, R., King, I., Lyu, M.R.: An extended level method for efficient multiple kernel learning. In: Advances in Neural Information Processing Systems 21, Proceedings of the Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), pp. 1825–1832 (2008)
Xu, Z., Jin, R., Yang, H., King, I., Lyu, M.R.: Simple and efficient multiple kernel learning by group lasso. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 1175–1182 (2010)
Xu, Z., Jin, R., Ye, J., Lyu, M.R., King, I.: Non-monotonic feature selection. In: Proceedings of the 26th Annual International Conference on Machine Learning (ICML), pp. 1145–1152 (2009)
Xu, Z., Jin, R., Zhu, S., Lyu, M.R., King, I.: Smooth optimization for effective multiple kernel learning. In: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (2010)
Xu, Z., King, I., Lyu, M.R., Jin, R.: Discriminative semi-supervised feature selection via manifold regularization. IEEE Trans. Neural Networks 21(7), 1033–1047 (2010)
Xu, Z., Zhe, S., Qi, Y., Yu, P.: Association discovery and diagnosis of alzheimer’s disease with bayesian multiview learning. J. Artif. Intell. Res. 56, 247–268 (2016)
Yang, H., Xu, Z., King, I., Lyu, M.R.: Online learning for group lasso. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 1191–1198 (2010)
Yang, H., Xu, Z., Lyu, M.R., King, I.: Budget constrained non-monotonic feature selection. Neural Networks 71, 214–224 (2015)
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 68(1), 49–67 (2006)
Zhe, S., Xu, Z., Qi, Y., Yu, P.: Sparse bayesian multiview learning for simultaneous association discovery and diagnosis of alzheimer’s disease. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 1966–1972 (2015)
Zhe, S., Xu, Z., Qi, Y., Yu, P., et al.: Joint association discovery and diagnosis of alzheimer’s disease by supervised heterogeneous multiview learning. In: Pacific Symposium on Biocomputing, vol. 19. World Scientific (2014)
Zhou, S., Yao, H., Yu, W., Wang, Y.: Tree-guided group sparse based representation for person re-identification. In: Proceedings of the International Conference on Internet Multimedia Computing and Service, pp. 14–17. ACM (2016)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 67(2), 301–320 (2005)
Acknowledgements
Datasets used in this paper are obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (ADNI official website: adni.loni.ucla.edu). The investigators who contributed to the design and implementation of ADNI and/or collected data can be found on ADNI official website.
This work was in part supported by grants of NSF China (No. 61572111), a 985 Project of UESTC (No. A1098531023601041) and a Fundamental Research Project of China Central Universities (No. ZYGX2016Z003).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhou, S., Yuan, S., Zhang, Z., Xu, Z. (2018). Association Study of Alzheimer’s Disease with Tree-Guided Sparse Canonical Correlation Analysis. In: Cheng, L., Leung, A., Ozawa, S. (eds) Neural Information Processing. ICONIP 2018. Lecture Notes in Computer Science(), vol 11307. Springer, Cham. https://doi.org/10.1007/978-3-030-04239-4_53
Download citation
DOI: https://doi.org/10.1007/978-3-030-04239-4_53
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04238-7
Online ISBN: 978-3-030-04239-4
eBook Packages: Computer ScienceComputer Science (R0)