Abstract
Recent advances in technology have allowed researchers to collect large scale complex biological data, simultaneously, often in matrix format. In genomic studies, for instance, measurements from tens to hundreds of thousands of genes are taken from individuals across several experimental groups. In time course microarray experiments, gene expression is measured at several time points for each individual across the whole genome resulting in a high-dimensional matrix for each gene. In such experiments, researchers are faced with high-dimensional longitudinal data. Unfortunately, traditional methods for longitudinal data are not appropriate for high-dimensional situations. In this paper, we use the growth curve model and introduce test useful for high-dimensional longitudinal data and evaluate its performance using simulations. We also show how our approach can be used to filter genes in time course genomic experiments. We illustrate this using publicly available genomic data, involving experiments comparing normal human lung tissue with vanadium pentoxide treated human lung tissue, designed with the aim of understanding the susceptibility of individuals working in petro-chemical factories to airway re-modelling. Using our method, we were able to filter out 1053 (about 5 %) genes as non-noise genes from a pool of 22,277. Although our focus is on hypothesis testing, we also provided modified maximum likelihood estimator for the mean parameter of the growth curve model and assessed its performance through bias and mean squared error.









Similar content being viewed by others
References
Casella G, Berger RL (2002) Statistical inference, 2nd edn. Thompson Learning, Duxbury Press, Belmont, CA
Chalifa-Caspi V et al (2004) GeneAnnot: comprehensive two-way linking between oligonucleotide array probesets and GeneCards genes. Bioinformatics 20(9):1457–1458
Chen SX, Qin YL (2010) A two-sample test for high-dimensional data with applications to gene-set testing. Ann Stat 38(2):808–835
Chen LS, Paul D, Prentice RL, Wang P (2012) A regularized Hotellings T2 test for pathway analysis in proteomic studies. J Am Stat Assoc 106(496):1345–1360
Ferrari F et al (2007) Novel definition files for human GeneChips based on GeneAnnot. BMC Bioinform 8:446–451
Hamid JS, Beyene J (2009) A multivariate growth curve model for ranking genes in replicated time-course microarray data. Stat Appl Genet Mol Biol 8(1):1–26
Hamid JS, Beyene J, Rosen DV (2011) A novel trace test for the mean parameters in a multivariate growth curve model. J Multivar Anal 102(2):238–251
Horn RA, Johnson CR (1985) Matrix analysis. Cambridge University Press, Cambridge
Ingram JL et al (2003) Vanadium-induced HB-EGF expression in human lung fibroblasts is oxidant-dependent and requires MAP kinases. Am J Physiol Lung Cell Mol Physiol 284(5):774–782
Ingram JL et al (2007) Genomic analysis of human lung fibroblasts exposed to vanadium pentoxide to identify candidate genes for occupational bronchitis. Respir Res 8(34.10):1186–1198
Jia D et al (2011) Genome-wide copy number analyses identified novel cancer genes in hepatocellular carcinoma. Hepatology 54(4):1227–1236
Jonhstone IM, Titterington DM (2009) Statistical challenges of high-dimensional data. Philos Trans R Soc Lond A Math Phys Eng Sci 367(1906):4237–4253
Khatri CG (1966) A note on a MANOVA model applied to problems in growth curve. Ann Inst Stat Math 18(1):75–86
Kimchi ET et al (2005) Progression of Barrett’s metaplasia to adenocarcinoma is associated with the suppression of the transcriptional programs of epidermal differentiation. Cancer Res 65(8):3146–3154
Kollo T, von Rosen D (2005) Advanced multivariate statistics with matrices. Springer, The Netherlands
Koo TH (2002) Syntenin is overexpressed and promotes cell migration in metastatic human breast and gastric cancer cell lines. Oncogene 21(26):4080–4088
Läuter J (2009) High-dimensional data analysis: selection of variables, data compression and graphics—application to gene expression. Biometr J 51(2):235–251
Levy BS et al (1984) Boilermakers’ bronchitis: respiratory tract irritation associated with vanadium pentoxide exposure during oil-to-coal conversion of a power plant. J Occup Environ Med 26(6):567–570
Liu DW, Chen ST, Liu HP (2005) Choice of endogenous control for gene expression in non small cell lung cancer. Eur Respir J 20:1002–1008
Liu Y et al (2007) Identification of genes differentailly expressed in human primary lung squamous cell carcinoma. Lung Cancer 56(3):307–317
Lönnstedt I, Speed T (2002) Replicated microarray data. Stat Sin 12(1):31–46
Ma P et al (2006) A data-driven clustering method for time course gene expression data. Nucleic Acids Res 34(4):1261–1269
Moore EH (1920) On the reciprocal of the general algebraic matrix. Bull Am Math Soc 26:294–295
Pan JX, Fang KT (2002) Growth curve models and statistical diagnostics. Springer, New York
Penrose R (1955) A generalized inverse for matrices. Proc Camb Philos Soc 51(3):406–413
Potthoff RF, Roy SN (1964) A generalized multivariate analysis of variance model useful especially for growth curve model. Biometrika 51(3–4):313–326
Rao CR, Mitra SK (1972) Generalized inverse of a matrix and its applications. In: Proceedings of the 6th Berkeley symposium on mathematical statistics and probability, vol 1, pp 601–620
Smyth GK (2004) Linear models and empirical Bayes methods for assessing diferential expression in microarray experiments. Stat Appl Genet Mol Biol 3(1):1–25
Tai YC, Speed TP (2006) A multivariate empirical Bayes statistic for replicated microarray time course data. Ann Stat 34(5):2387–2412
Tai YC, Speed TP (2009) On gene ranking using replicated microarray time course data. Biometrics 65(1):40–51
von Rosen D (1989) Maximum likelihood estimators in multivariate linear normal models. J Multivar Anal 31(2):187–200
Wey E, Lyons GE, Schafer BW (1994) A human POU domain gene, mPOU, is expressed in developing brain and specific adult tissues. Eur J Biochem 220(3):753–762
Yan X et al (2012) External Qi of Yan Xin Qigong induces cell death and gene expression alterations promoting apoptosis and inhibiting proliferation, migration and glucose metabolism in small-cell lung cancer cells. Mol Cell Biochem 363(1–2):245–255
Yuan M, Kendziorksi C (2006) Hidden Markov models for microarray time course data in multiple biological conditions. J Am Stat Assoc 101(476):1323–1332
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jana, S., Balakrishnan, N., von Rosen, D. et al. High dimensional extension of the growth curve model and its application in genetics. Stat Methods Appl 26, 273–292 (2017). https://doi.org/10.1007/s10260-016-0369-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10260-016-0369-4