Abstract
Longitudinal data used as repeat measures may capture the proportion of total variance due to genetic factors with greater sensitivity. However, for brain imaging in studies of older adults, there is a steady decline of brain tissue. It is important to establish such estimation methods using longitudinal data, while properly modeling within-subject variation and rate of tissue atrophy. However, to date, neuroimaging studies have been limited to using only two timepoints, and have not considered diagnostic-specific trends in clinically heterogeneous samples. Modeling temporal patterns of brain structure specific to neurodegenerative disease, while simultaneously assessing the contribution of genetic and environmental risk factors, is essential to understanding and predicting disease progression. We use data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) to model the genetic effects on brain cortical measurements from three repeated measures across two years. We refine our model for specific diagnostic groups, including cognitively normal elderly individuals, individuals with mild cognitive impairment and AD, and then distinguish between those who remain stable or convert to AD. We propose a support vector based, longitudinal autoregressive linear mixed model (ARLMM) for long-term repeated measurements, offering greater sensitivity than cross-sectional analyses in baseline scans alone.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Compared to the standard cross-sectional approach for neuroimaging studies, longitudinal clinical designs give important insights into temporal dynamics of the normal aging process or an underlying disease progression, and provide significantly increased statistical power by reducing confounding effects of within-subject variability [1,2,3]. Once a complex trait is established as heritable [4], identifying specific genetic markers that explain trait variability may help researchers gain deeper insights into its underlying biological mechanism and offer new targets for therapies.
Most disorders of the brain are still without cures and many lacking effective treatments and there is an increasing prevalence of later life cognitive impairment and neurodegenerative disorders. There is, therefore, an urgent need to understand the genetic and environmental risk factors that may attribute to abnormal brain decline. Longitudinal brain imaging studies such as the multi-site North American Alzheimer’s Disease Neuroimaging Initiative (ADNI) [5] acquire clinical, imaging, genetic and biospecimen data that help track cognitive change over time, and aim to identify brain imaging biomarkers that will help predict such changes in clinical populations.
In genome-wide association studies (GWAS) of human disorders, including Alzheimer’s disease (AD) [6], an association across each of the millions of single nucleotide polymorphisms (SNPs) across the genome is conducted to identify specific genetic loci that may confer increased risk for the disorder. Individually, SNPs typically explain less than 1% of the population variance in a trait [7], so GWAS usually involve tens to hundreds of thousands of individuals. Polygenic risk scores (PRS), with weights derived from the results of a large and well-powered GWAS of individuals (as training data), can be used to provide an aggregate measure of an individual’s genetic risk for individuals from an independent (testing) dataset [8]. Other than large-scale biobanks that include brain imaging [9], single cohort studies of under ten thousand subjects are typically not well powered for performing GWAS. Therefore, instead of single-SNP analyses, here we assess the effect of PRS for AD on brain measures.
In most longitudinal imaging studies of brain aging and neurodegeneration, including most other ADNI publications, researchers only use two timepoints – the baseline and a single follow-up – to assess the longitudinal changes as a simple difference. As there are relatively small sample sizes in neuroimaging studies compared to other epidemiological or genetic studies, this can unnecessarily reduce the available power. Repeat measures taken on a single individual at different times are almost always correlated, with measures taken closer in time being more highly correlated than measures taken further apart in time [10]. Ignoring the dependence of measures both within and across individuals may increase Type 1 error and reduce statistical power [11], especially when there is a known correlation structure such as the genetic relationship matrix (GRM) [4], which captures the genetic similarity across individuals using genome-wide polymorphism information, or other mixed confounding effects such as MRI scanner differences. Furthermore, in longitudinal studies of multi-diagnostic populations, such as ADNI, an additional correlation structure may exist between the time-series of observations from individuals within the same diagnostic classification. Accounting for this diagnostic-specific correlation structure may help to map the trajectories of decline with greater specificity, and improve precision in statistical inferences of genetic or fixed-effects risk variables [10]. This is particularly important for those individuals who ultimately convert to Alzheimer’s disease, allowing us to better identify at-risk individuals before the onset of dementia.
Mixed models that are appropriate for longitudinal genetic applications, such as GWAS, include longitudinal MMHE [3] and PMALT [11]. Longitudinal MMHE is based on moment matching learning. This method impressively speeds up the training process, but heavily relies on the large data size to ensure accuracy. PMALT is a prospective likelihood score test based on mixed models, which introduces a random effect with an exponential covariance structure to model the phenotypic autocorrelation, while does not focus on temporal structure modelling specific to different clinical settings.
To the best of our knowledge, genetic influences on brain structure have not been modeled with more than two timepoints, nor have prior efforts attempted to ensure proper modeling of known genetic, environmental, and diagnostic sources of correlation across individual measurements. To better understand disease progression trajectories by modeling spatial-temporal brain structure patterns, we propose a support vector based, longitudinal autoregressive linear mixed model (ARLMM) for long-term repeated measurements. This model considers an autoregressive random effect to account for diagnostic and site variabilities, and in modeling effects of PRS on cortical thickness in ADNI. We also analyzed the genetic associations separately in individual diagnostic groups, as the \( \varepsilon \)-insensitive loss, implemented in support vector regression (SVR) [12] is applied for smaller samples.
2 Support Vector Autoregressive Mixed Model
2.1 Model Specification
We consider the matrix form of the Autoregressive Linear Mixed Model (ARLMM)
Suppose we have imaging phenotypes, covariates and genotype data for \( N \) individuals, each with \( n \) \( \left( {n \ge 3} \right) \) repeated measurements. \( Y \) \( \left( {{\text{size}}\,nN \times 1} \right) \) denotes imaging phenotypes chronically, and \( X \) \( \left( {{\text{size}}\,nN \times q} \right) \) denotes a covariate matrix, which may include time-varying covariates such as age, and static covariances such as sex [11]. When testing PRS association, we represent the PRS as a vector \( x \) (size \( nN \times 1 \)) as follows, while \( \beta \) (constant) and \( b \) (size \( q \times 1 \)) are coefficients
Brain structural measures are denoted as Y, which is modeled as the sum of the linear trend and Gaussian-distributed random effects. The distributions of random effects are specified by unknown variance components \( \sigma_{g}^{2} , \sigma_{t}^{2} , \sigma_{s}^{2} , \sigma_{e}^{2} \); and through known or modeled relationships including the GRM \( \varSigma_{g} \), block-diagonal matrices \( \varSigma_{s} \) and \( \varSigma_{t} \), and the identity matrix \( I \). There are two types of random effects: within-subject variations \( g \) and \( t \), and between-subject \( s \) and \( e \). \( g \) and \( t \) represent genetic relatedness and autocorrelation for one single individual’s phenotypes over time, respectively; \( s \) corresponds to the between-subject variations due to scanner differences, where each subblock of \( \varSigma_{s} \) is a matrix with every entry of 1’s, corresponding to individuals whose images were acquired on the same scanner. \( e \) represents between-subject environmental errors.
There are 5 diagnostic categories for the ADNI participants across the three assessed time points, corresponding to stable clinical diagnoses and those who convert: (1) stable cognitively normal controls (sCN); (2) stable mild cognitive impairment (sMCI); (3) early converting MCI (ecMCI): those who were categorized as MCI at baseline, but converted to AD by the time of their first follow up at 12-months and remained AD at 24-months; (4) late converting MCI (lcMCI): those who were categorized as MCI but converted to AD between the 12 and 24-month follow-ups; and (5) stable AD (sAD).
To account for the diagnosis-specific time-varying correlation, we assume \( \varSigma_{t} \) contains unknown parameters. We hypothesize that for subject i at time-point j, \( t_{ij} \) is proportional to the most recent disease progression information \( t_{ij - 1} \) under a first-order autoregressive model, for which autoregressive fluctuations are absorbed by the between-subject environmental errors.
The disease progression parameter, \( \rho_{ij} \) is assumed to be only dependent on the diagnostic classification at time point \( j \) and \( j - 1 \). Thus, it is specified for each diagnostic group: the stable groups \( \alpha \) (CN -> CN), \( \beta \) (MCI -> MCI), \( \gamma \) (AD -> AD), and conversion \( \theta \) (MCI -> AD). For each subject, the block-diagonal submatrices of \( \varSigma_{t} \) have the following form:
2.2 Parameter Estimation
Step 1.
To obtain robust and efficient estimates of component variances \( \sigma_{g}^{2} , \sigma_{t}^{2} , \sigma_{s}^{2} , \sigma_{e}^{2} \), we apply a L2-regularized squared \( \varepsilon \)-insensitive loss on projected phenotypic covariance and projected random effect covariances. Here, \( P \) is an orthogonal projection matrix which satisfies \( P^{T} \left[ {X, x} \right] = 0 \), and projects out both covariates and PRS in Eq. (3). Component variance estimates can be obtained by minimizing L2-regularized \( L \), via a stochastic gradient descent:
In our implementation, we solved the optimization problem in stable clinical groups first, then infer parameter \( \theta \) in the converter groups with estimated disease progression parameters \( \alpha \), \( \beta \) and \( \gamma \).
Our optimization strategy may be viewed as one of the moment matching methods of mixed models [3], but simultaneously learns the unknown component variance and disease progression parameters, which may provide a better description of the disease progression trajectories for stable and converting groups. Furthermore, squared-\( \epsilon \) insensitive loss ignores any training data close to (within a threshold \( \epsilon \)) the predicted phenotypic covariances, allowing slightly misspecifying covariance structures of random effects, which may reduce the bias of fixed in both small and large samples [10]. Instead of minimizing the observed training errors, squared-\( \epsilon \) insensitive loss attempts to minimize the generalization error bound [12], an important consideration for reproducibility in datasets of clinical populations.
Step 2.
After the covariance structures for all random effects are determined, the coefficient \( b \) and \( \beta \) (only in the PRS association) has a closed form, which minimizes the L2-loss. Delete-one jackknife resampling provided estimate standard errors and p-values for all associations.
3 Experiments
3.1 Alzheimer’s Disease Neuroimaging Initiative (ADNI)
T1-weighted MRI brain imaging data from 593 ADNI participants scanned at baseline, 12 months and 24 months was used in this analysis. Images were acquired on 1.5T and 3T scanners during ADNI1 and ADNI2 respectively, using similar imaging parameters [5]. We excluded subjects with non-matched or missing genetic information and inaccurate FreeSurfer measures. If additional follow-up information was available, any individual who converted (from CN to MCI or AD and MCI to AD) after the 24-month follow-up was excluded. The demographic information for the diagnostic groups is detailed in Table 1.
We extracted cortical thickness measures from 34 target regions of interest (ROIs). As the spatial-temporal pattern of brain atrophy in ADNI is complex and highly variable, longitudinal cortical thickness measures were processed with a density based spatial-temporal clustering algorithm, DBSCAN, to remove outliers [13].
3.2 Polygenic Risk Scores and Genetic Relationship Matrix
To estimate pairwise genetic similarity in ADNI, the GRM was built with 644,855 SNPs (minor allele frequency >0.01) [4]. Weights for PRS were determined from the stage 1 results of the International Genomics of Alzheimer’s Project (IGAP) GWAS [14], which were considered as training set to derive the SNP effect size. To assess only the impact of the most significant SNPs, we used SNPs that met a p-value threshold of 0.00001 to calculate PRS from the most significant SNPs in the GWAS. It is important to mention the ADNI participants used in this analysis were not included in the stage 1 of the IGAP GWAS from which weights were derived.
3.3 Experimental Settings
Our ARLMM was applied for each of the five clinical groups to determine any association between PRS and average cortical thickness for left and right average cortical thickness from 34 FreeSurfer defined ROIs; covariates included age, sex, ICV, and a dummy variable for field strength. We compared the proposed longitudinal mixed model with its two close cross-sectional counterparts including multivariate linear regression (LR) and support vector regression (SVR) using only data collected at baseline.
To prevent overfitting and improve the model’s predictive performance, we used a 10-fold cross validation by computing the mean squared error (MSE), with a validation-based (10%) early stopping.
3.4 Experimental Results
After Bonferroni correction across the 34 cortical regions assessed, PRS was negatively associated with parahippocampal thickness (p = 0.0007) in the sMCI group, and positively associated with the thickness of the lingual gyrus (p = 0.0014) for the later converting lcMCI (see Fig. 1). Using our approach, disease progression parameters \( \alpha \), \( \beta \), \( \gamma \), and \( \theta \) are significant (Bonferroni corrected) across almost all 34 regions, which may capture the temporal pattern of brain cortical changes in different clinical groups. \( \theta \) corresponding to the conversion groups is observed to be significantly lower than disease progression parameters \( \alpha \), \( \beta \), \( \gamma \) for stable groups.
PRS z-scores of cortical thickness in 34 ROIs for 5 clinical groups are mapped onto the brain. Brain regions where PRS passes Bonferroni corrected threshold of p < 0.05/34 are marked with red asterisks. Using ARLMM, PRS is found to be negatively associated with the thicknesses of the parahippocampal cortex (p = 0.0007) in the sMCI group, and also positively with the lingual gyrus (p = 0.0014) in the later-converting MCI group (Color figure online).
Neither linear regression nor support vector regression detected significant associations between PRS and cortical thickness (all p > 0.05). These methods do not capture the temporal correlation in brain structure, and can only be used when there is no relatedness between individuals; we compare our method with these approaches as these are the methods most commonly used when assessing the genetic influences of brain structure.
4 Discussion
Here, we proposed a longitudinal autoregressive linear mixed model suitable to detect genetic associations from clinical subsets of data with three time points. ARLMM could be further extended to allow unequal measurements, as long as there exists one individual measured at no less than 3 time points. A limitation of this method may be that it cannot handle missing genotype or phenotype data, or participants with fewer measurements, which is part of our future directions planned with this model. We also note that the genetic relationship among ADNI participants is not very strong, and therefore this model may show greater promise in cohorts with stronger genetic relationships. Furthermore, to establish the robustness of our model, we can iteratively remove individuals to obtain estimates on minimal bounds of our method, and determine the robustness when only using two time points. In settings where clinical diagnoses of MCI or dementia may be uncertain or not evaluated, subjects may be grouped based on biological signatures, such as amyloid or tau positivity, sex, or environmental exposures, such as pollutant exposures, or education level, if the variables are believed to show different rates of brain atrophy.
References
Bernal-Rusiel, J.L., et al.: Statistical analysis of longitudinal neuroimage data with linear mixed effects models. Neuroimage 66, 249–260 (2013)
Guillaume, B., et al.: Fast and accurate modelling of longitudinal and repeated measures neuroimaging data. Neuroimage 94, 287–302 (2014)
Ge, T., et al.: Heritability analysis with repeat measurements and its application to resting-state functional connectivity. Proc. Natl. Acad. Sci. 114(21), 5521–5526 (2017)
Yang, J., et al.: Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42(7), 565 (2010)
Jack Jr., C.R., et al.: The Alzheimer’s disease neuroimaging initiative (ADNI): MRI methods. J. Magn. Reson. Imaging 27(4), 685–691 (2008)
Lambert, J.C., et al.: Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nat. Genet. 45(12), 1452 (2013)
Munafò, M.R., et al.: A manifesto for reproducible science. Nature Human Behavior 1, 0021 (2017)
Khera, A.V., et al.: Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 50(9), 1219 (2018)
Elliott, L.T., et al.: Genome-wide association studies of brain imaging phenotypes in UK Biobank. Nature 562(7726), 210 (2018)
Littell, R.C., et al.: Modelling covariance structure in the analysis of repeated measures data. Stat. Med. 19(13), 1793–1819 (2000)
Wu, X., et al.: L-GATOR: genetic association testing for a longitudinally measured quantitative trait in samples with related individuals. Am. J. Hum. Genet. 102(4), 574–591 (2018)
Basak, D., et al.: Support vector regression. Neural Inf. Process.-Lett. Rev. 11(10), 203–224 (2007)
Schubert, E., et al.: DBSCAN revisited, revisited: why and how you should (still) use DBSCAN. ACM Trans. Database Syst. (TODS) 42(3), 19 (2017)
Ding, L., et al.: Voxelwise meta-analysis of brain structural associations with genome-wide polygenic risk for Alzheimer’s disease. In: 14th International Symposium on Medical Information Processing and Analysis, vol. 10975. International Society for Optics and Photonics (2018)
Acknowledgements
We acknowledge support from NIH grant R01AG059874 High resolution mapping of the genetic risk for disease in the human brain. Data used in preparing this paper were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) [5] dataset, which involves both phase 1 and phase 2. Many investigators within ADNI contributed to the design and implementation of ADNI, and/or provided data but did not participate in analysis or writing of this paper.
Author information
Authors and Affiliations
Consortia
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Yang, Q. et al. (2019). Support Vector Based Autoregressive Mixed Models of Longitudinal Brain Changes and Corresponding Genetics in Alzheimer’s Disease. In: Rekik, I., Adeli, E., Park, S. (eds) Predictive Intelligence in Medicine. PRIME 2019. Lecture Notes in Computer Science(), vol 11843. Springer, Cham. https://doi.org/10.1007/978-3-030-32281-6_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-32281-6_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32280-9
Online ISBN: 978-3-030-32281-6
eBook Packages: Computer ScienceComputer Science (R0)