skip to main content
10.1145/2616498.2616515acmotherconferencesArticle/Chapter ViewAbstractPublication PagesxsedeConference Proceedingsconference-collections
research-article

Accelerating Sparse Canonical Correlation Analysis for Large Brain Imaging Genetics Data

Published: 13 July 2014 Publication History

Abstract

Recent advances in acquiring high throughput neuroimaging and genomics data provide exciting new opportunities to study the influence of genetic variation on brain structure and function. Research in this emergent field, known as imaging genetics, aims to identify the association between genetic variations such as single nucleotide polymorphisms (SNPs) and neuroimaging quantitative traits (QTs). Sparse canonical correlation analysis (SCCA) is a bi-multivariate analysis method that has the potential to reveal complex multi-SNP-multi-QT associations. However, the scale and complexity of the imaging genetic data have presented critical computational bottlenecks requiring new concepts and enabling tools. In this paper, we present our initial efforts on developing a set of massively parallel strategies to accelerate a widely used SCCA implementation provided by the Penalized Multivariate Analysis (PMA) software package. In particular, we exploit parallel packages of R, optimized mathematical libraries, and the automatic offload model for Intel Many Integrated Core (MIC) architecture to accelerate SCCA. We create several simulated imaging genetics data sets of different sizes and use these synthetic data to perform comparative study. Our performance evaluation demonstrates that a 2-fold speedup can be achieved by the proposed acceleration. The preliminary results show that by combining data parallel strategy and the offload model for MIC we can significantly reduce the knowledge discovery timelines involving applying SCCA on large brain imaging genetics data.

References

[1]
B. B. Avants, D. J. Libon, K. Rascovsky, A. Boller, C. T. McMillan, L. Massimo, H. B. Coslett, A. Chatterjee, R. G. Gross, and M. Grossman. Sparse canonical correlation analysis relates network-level atrophy to multivariate cognitive measures in a neurodegenerative population. Neuroimage, 84:698--711, 2014.
[2]
J. C. Barrett. Haploview: Visualization and analysis of snp genotype data. Cold Spring Harb Protoc, 2009(10):pdb ip71, 2009.
[3]
J. C. Barrett, B. Fry, J. Maller, and M. J. Daly. Haploview: analysis and visualization of ld and haplotype maps. Bioinformatics, 21(2):263--5, 2005.
[4]
M. Chadeau-Hyam, C. J. Hoggart, P. F. O'Reilly, J. C. Whittaker, M. De Iorio, and D. J. Balding. Fregene: simulation of realistic sequence-level data in populations and ascertained samples. BMC Bioinformatics, 9:364, 2008.
[5]
J. Chen, F. D. Bushman, et al. Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis. Biostatistics, 14(2):244--258, 2013.
[6]
X. Chen, H. Liu, and J. G. Carbonell. Structured sparse canonical correlation analysis. In International Conference on Artificial Intelligence and Statistics, 2012.
[7]
E. Chi, G. Allen, et al. Imaging genetics via sparse canonical correlation analysis. In Biomedical Imaging (ISBI), 2013 IEEE 10th Int Sym on, pages 740--743, 2013.
[8]
D. Eddelbuettel. Cran task view: High-performance and parallel computing with r. Technical report, Version 2010-12-12, URL http://CRAN. R-project. org/view= HighPerformanceComputing, 2010.
[9]
Y. El-Khamra, N. Gaffney, D. Walling, E. Wernert, W. Xu, and H. Zhang. Performance evaluation of r with intel xeon phi coprocessor. In Big Data, 2013 IEEE International Conference on, pages 23--30. IEEE, 2013.
[10]
J. L. Gustafson and B. S. Greer. Clearspeed whitepaper: Accelerating the intel math kernel library, 2007.
[11]
C. J. Hoggart, M. Chadeau-Hyam, T. G. Clark, R. Lampariello, J. C. Whittaker, M. De Iorio, and D. J. Balding. Sequence-level population simulations over large genomic regions. Genetics, 177(3):1725--31, 2007.
[12]
M. Intel. Intel math kernel library, 2007.
[13]
M. N. Li and A. Rossini. Rpvm: Cluster statistical computing in r. Porting R to Darwin/X11 and Mac OS X, page 4, 2001.
[14]
D. Lin, V. D. Calhoun, and Y. P. Wang. Correspondence between fMRI and SNP data by group sparse canonical correlation analysis. Med Image Anal, 2013.
[15]
E. Parkhomenko, D. Tritchler, and J. Beyene. Sparse canonical correlation analysis with application to genomic data integration. Statistical Applications in Genetics and Molecular Biology, 8:1--34, 2009.
[16]
A. J. Rossini, L. Tierney, and N. Li. Simple parallel statistical computing in r. Journal of Computational and Graphical Statistics, 16(2):399--420, 2007.
[17]
M. Schmidberger, M. Morgan, D. Eddelbuettel, H. Yu, L. Tierney, and U. Mansmann. State-of-the-art in parallel computing with r. Journal of Statistical Software, 47(1), 2009.
[18]
L. Shen, S. Kim, et al. Whole genome association study of brain-wide imaging phenotypes for identifying quantitative trait loci in MCI and AD: A study of the ADNI cohort. Neuroimage, 53(3):1051--63, 2010.
[19]
J. Sheng, S. Kim, J. Yan, J. H. Moore, A. J. Saykin, L. Shen, and for the ADNI. Data synthesis and method evaluation for brain imaging genetics. In Biomedical Imaging (ISBI), 2014 IEEE 11th Int Sym on, 2014, in press.
[20]
J. L. Stein, X. Hua, S. Lee, A. J. Ho, A. D. Leow, A. W. Toga, A. J. Saykin, L. Shen, T. Foroud, N. Pankratz, M. J. Huentelman, D. W. Craig, J. D. Gerber, A. N. Allen, J. J. Corneveaux, B. M. Dechairo, S. G. Potkin, M. W. Weiner, M. T. P, and ADNI. Voxelwise genome-wide association study (vGWAS). Neuroimage, 2010.
[21]
M. Vounou, T. E. Nichols, and G. Montana. Discovering genetic associations with high-dimensional neuroimaging phenotypes: A sparse reduced-rank regression approach. NeuroImage, 53(3):1147--59, 2010.
[22]
J. Wan, S. Kim, et al. Hippocampal surface mapping of genetic risk factors in AD via sparse learning models. MICCAI, 14(Pt 2):376--83, 2011.
[23]
D. M. Witten, R. Tibshirani, and T. Hastie. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, 10(3):515--34, 2009.
[24]
H. Yu. Rmpi: Parallel statistical computing in r. R News, 2(2):10--14, 2002.

Index Terms

  1. Accelerating Sparse Canonical Correlation Analysis for Large Brain Imaging Genetics Data

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      XSEDE '14: Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment
      July 2014
      445 pages
      ISBN:9781450328937
      DOI:10.1145/2616498
      • General Chair:
      • Scott Lathrop,
      • Program Chair:
      • Jay Alameda
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      In-Cooperation

      • NSF: National Science Foundation
      • Drexel University
      • Indiana University: Indiana University

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 13 July 2014

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Brain Imaging Genetics
      2. Parallel Computing
      3. R
      4. Sparse Canonical Correlation Analysis

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      XSEDE '14

      Acceptance Rates

      XSEDE '14 Paper Acceptance Rate 80 of 120 submissions, 67%;
      Overall Acceptance Rate 129 of 190 submissions, 68%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 148
        Total Downloads
      • Downloads (Last 12 months)3
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 01 Mar 2025

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media