Skip to main content
Log in

High dimensional extension of the growth curve model and its application in genetics

  • Original Paper
  • Published:
Statistical Methods & Applications Aims and scope Submit manuscript

Abstract

Recent advances in technology have allowed researchers to collect large scale complex biological data, simultaneously, often in matrix format. In genomic studies, for instance, measurements from tens to hundreds of thousands of genes are taken from individuals across several experimental groups. In time course microarray experiments, gene expression is measured at several time points for each individual across the whole genome resulting in a high-dimensional matrix for each gene. In such experiments, researchers are faced with high-dimensional longitudinal data. Unfortunately, traditional methods for longitudinal data are not appropriate for high-dimensional situations. In this paper, we use the growth curve model and introduce test useful for high-dimensional longitudinal data and evaluate its performance using simulations. We also show how our approach can be used to filter genes in time course genomic experiments. We illustrate this using publicly available genomic data, involving experiments comparing normal human lung tissue with vanadium pentoxide treated human lung tissue, designed with the aim of understanding the susceptibility of individuals working in petro-chemical factories to airway re-modelling. Using our method, we were able to filter out 1053 (about 5 %) genes as non-noise genes from a pool of  22,277. Although our focus is on hypothesis testing, we also provided modified maximum likelihood estimator for the mean parameter of the growth curve model and assessed its performance through bias and mean squared error.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Casella G, Berger RL (2002) Statistical inference, 2nd edn. Thompson Learning, Duxbury Press, Belmont, CA

    Google Scholar 

  • Chalifa-Caspi V et al (2004) GeneAnnot: comprehensive two-way linking between oligonucleotide array probesets and GeneCards genes. Bioinformatics 20(9):1457–1458

    Article  Google Scholar 

  • Chen SX, Qin YL (2010) A two-sample test for high-dimensional data with applications to gene-set testing. Ann Stat 38(2):808–835

    Article  MathSciNet  MATH  Google Scholar 

  • Chen LS, Paul D, Prentice RL, Wang P (2012) A regularized Hotellings T2 test for pathway analysis in proteomic studies. J Am Stat Assoc 106(496):1345–1360

    Article  MATH  Google Scholar 

  • Ferrari F et al (2007) Novel definition files for human GeneChips based on GeneAnnot. BMC Bioinform 8:446–451

    Article  Google Scholar 

  • Hamid JS, Beyene J (2009) A multivariate growth curve model for ranking genes in replicated time-course microarray data. Stat Appl Genet Mol Biol 8(1):1–26

    MathSciNet  MATH  Google Scholar 

  • Hamid JS, Beyene J, Rosen DV (2011) A novel trace test for the mean parameters in a multivariate growth curve model. J Multivar Anal 102(2):238–251

    Article  MathSciNet  MATH  Google Scholar 

  • Horn RA, Johnson CR (1985) Matrix analysis. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Ingram JL et al (2003) Vanadium-induced HB-EGF expression in human lung fibroblasts is oxidant-dependent and requires MAP kinases. Am J Physiol Lung Cell Mol Physiol 284(5):774–782

    Article  Google Scholar 

  • Ingram JL et al (2007) Genomic analysis of human lung fibroblasts exposed to vanadium pentoxide to identify candidate genes for occupational bronchitis. Respir Res 8(34.10):1186–1198

    Google Scholar 

  • Jia D et al (2011) Genome-wide copy number analyses identified novel cancer genes in hepatocellular carcinoma. Hepatology 54(4):1227–1236

    Article  Google Scholar 

  • Jonhstone IM, Titterington DM (2009) Statistical challenges of high-dimensional data. Philos Trans R Soc Lond A Math Phys Eng Sci 367(1906):4237–4253

    Article  MathSciNet  MATH  Google Scholar 

  • Khatri CG (1966) A note on a MANOVA model applied to problems in growth curve. Ann Inst Stat Math 18(1):75–86

    Article  MathSciNet  MATH  Google Scholar 

  • Kimchi ET et al (2005) Progression of Barrett’s metaplasia to adenocarcinoma is associated with the suppression of the transcriptional programs of epidermal differentiation. Cancer Res 65(8):3146–3154

    Google Scholar 

  • Kollo T, von Rosen D (2005) Advanced multivariate statistics with matrices. Springer, The Netherlands

    Book  MATH  Google Scholar 

  • Koo TH (2002) Syntenin is overexpressed and promotes cell migration in metastatic human breast and gastric cancer cell lines. Oncogene 21(26):4080–4088

    Article  Google Scholar 

  • Läuter J (2009) High-dimensional data analysis: selection of variables, data compression and graphics—application to gene expression. Biometr J 51(2):235–251

    Article  MathSciNet  Google Scholar 

  • Levy BS et al (1984) Boilermakers’ bronchitis: respiratory tract irritation associated with vanadium pentoxide exposure during oil-to-coal conversion of a power plant. J Occup Environ Med 26(6):567–570

    Article  Google Scholar 

  • Liu DW, Chen ST, Liu HP (2005) Choice of endogenous control for gene expression in non small cell lung cancer. Eur Respir J 20:1002–1008

    Article  Google Scholar 

  • Liu Y et al (2007) Identification of genes differentailly expressed in human primary lung squamous cell carcinoma. Lung Cancer 56(3):307–317

    Article  Google Scholar 

  • Lönnstedt I, Speed T (2002) Replicated microarray data. Stat Sin 12(1):31–46

    MathSciNet  MATH  Google Scholar 

  • Ma P et al (2006) A data-driven clustering method for time course gene expression data. Nucleic Acids Res 34(4):1261–1269

    Article  Google Scholar 

  • Moore EH (1920) On the reciprocal of the general algebraic matrix. Bull Am Math Soc 26:294–295

    Article  Google Scholar 

  • Pan JX, Fang KT (2002) Growth curve models and statistical diagnostics. Springer, New York

    Book  MATH  Google Scholar 

  • Penrose R (1955) A generalized inverse for matrices. Proc Camb Philos Soc 51(3):406–413

    Article  MathSciNet  MATH  Google Scholar 

  • Potthoff RF, Roy SN (1964) A generalized multivariate analysis of variance model useful especially for growth curve model. Biometrika 51(3–4):313–326

    Article  MathSciNet  MATH  Google Scholar 

  • Rao CR, Mitra SK (1972) Generalized inverse of a matrix and its applications. In: Proceedings of the 6th Berkeley symposium on mathematical statistics and probability, vol 1, pp 601–620

  • Smyth GK (2004) Linear models and empirical Bayes methods for assessing diferential expression in microarray experiments. Stat Appl Genet Mol Biol 3(1):1–25

    Google Scholar 

  • Tai YC, Speed TP (2006) A multivariate empirical Bayes statistic for replicated microarray time course data. Ann Stat 34(5):2387–2412

    Article  MathSciNet  MATH  Google Scholar 

  • Tai YC, Speed TP (2009) On gene ranking using replicated microarray time course data. Biometrics 65(1):40–51

    Article  MathSciNet  MATH  Google Scholar 

  • von Rosen D (1989) Maximum likelihood estimators in multivariate linear normal models. J Multivar Anal 31(2):187–200

    Article  MathSciNet  MATH  Google Scholar 

  • Wey E, Lyons GE, Schafer BW (1994) A human POU domain gene, mPOU, is expressed in developing brain and specific adult tissues. Eur J Biochem 220(3):753–762

    Article  Google Scholar 

  • Yan X et al (2012) External Qi of Yan Xin Qigong induces cell death and gene expression alterations promoting apoptosis and inhibiting proliferation, migration and glucose metabolism in small-cell lung cancer cells. Mol Cell Biochem 363(1–2):245–255

    Article  Google Scholar 

  • Yuan M, Kendziorksi C (2006) Hidden Markov models for microarray time course data in multiple biological conditions. J Am Stat Assoc 101(476):1323–1332

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jemila Seid Hamid.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jana, S., Balakrishnan, N., von Rosen, D. et al. High dimensional extension of the growth curve model and its application in genetics. Stat Methods Appl 26, 273–292 (2017). https://doi.org/10.1007/s10260-016-0369-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10260-016-0369-4

Keywords

Navigation