Skip to main content
Log in

Gene expression time series modeling with principal component and neural network

  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

In this work, gene expression time series models have been constructed by using principal component analysis (PCA) and neural network (NN). The main contribution of this paper is to develop a methodology for modeling numerical gene expression time series. The PCA-NN prediction models are compared with other popular continuous prediction methods. The proposed model can give us the extracted features from the gene expressions time series and the orders of the prediction accuracies. Therefore, the model can help practitioners to gain a better understanding of a cell cycle, and to find the dependency of genes, which is useful for drug discoveries. Based on the results of two public real datasets, the PCA-NN method outperforms the other continuous prediction methods. In the time series model, we adapt Akaike's information criteria (AIC) tests and cross-validation to select a suitable NN model to avoid the overparameterized problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Causton HC, Quackenbush J, Brazma A (2003) Microarray gene expression data analysis: a beginner's guide. Blackwell, Oxford

  2. Khan J et al (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7(6):673–679

    Google Scholar 

  3. Nikkilä J, Törönen P, Kaski S, Venna J, Castrén E, Wong G (2002) Analysis and visualization of gene expression data using self-organizing maps. Neural Netw 15(8–9):953–966

    Google Scholar 

  4. Wu W, Wildsmith SE, Winkley AJ, Yallop R, Elcock FJ, Bugelski PJ (2001) Chemometric strategies for normalisation of gene expression data obtained from cDNA microarrays. Analytica Chimica Acta 446(1–2):449–464

    Google Scholar 

  5. Ji X, Li-Ling J, Sun Z (2003) Mining gene expression data using a novel approach based on hidden Markov models. FEBS Lett 542:124–131

    Google Scholar 

  6. Aach J, Church GM (2001) Aligning gene expression time series with time warping algorithms. Bioinformatics 17(6):495–508

    Google Scholar 

  7. Butte AJ et al (2001) Comparing the similarity of time-series gene expression using signal processing metrics. J Biomed Inform 34:396–405

    Google Scholar 

  8. Chen T, Filkov V, Skiena SS (2001) Identifying gene regulatory networks from experimental data. Parallel Comput 27:141–162

    Google Scholar 

  9. Dewey TG (2002) From microarrays to networks: mining expression time series. Information Biotechnol (suppl Drug Discov Today) 7(20):170–175

    Google Scholar 

  10. Hornquist M, Hertz J, Wahde M (2003) Effective dimensionality of large-scale expression data using principal component analysis. BioSystem 65:147–156

    Google Scholar 

  11. Bicciato S, Luchini A, Di Bello C (2003) PCA disjoint models for multiclass cancer analysis using gene expression data. Bioinformatics 19(5):571–578

    Google Scholar 

  12. Taylor J, King RD, Altmann T, Fiehn O (2002) Application of metabolomics to plant genotype discrimination using statistics and machine learning. Bioinformatics 18(suppl 2):241–248

    Google Scholar 

  13. Yeung KY, Ruzzo WL (2001) Principal component analysis for clustering gene expression data. Bioinformatics 17(9):763–774

    Google Scholar 

  14. Herrero J, Valencia A, Dopzao J (2001) A hierarchical unsupervised growing neural network for clustering gene expression patterns. Bioinformatics 17(2):126–136

    Google Scholar 

  15. Peterson C, Ringner M (2003) Analyzing tumor gene expression profiles. Artif Intell Med 28:59–74

    Google Scholar 

  16. Sawa T (2003) A neural network-based similarity index for clustering dna microarray data. Comput Biol Med 33:1–15

    Google Scholar 

  17. Spicker JS et al (2002) Neural network predicts sequence of TP53 gene based on DNA chip. Bioinformatics 18(8):1133–1134

    Google Scholar 

  18. Costa IG de Carvalho FdeAT, de Souto MCP (2002) A symbolic approach to gene expression time series analysis. Neural Networks 2002 Brazilian symposium, pp 25–30

  19. Yoshioka T, Ishii S (2002) Clustering for time-series gene expression data using mixture of constrained PCAS. Neural information processing, ICONIP '02, pp 2239–2243 (v5)

  20. Tabus I, Astola J (2003) Clustering the non-uniformly sampled time series of gene expression data. In: Signal processing and its applications, 2003. Proceedings of the 7th international symposium on, vol 2, 1–4 July, pp 61–64

  21. Syeda-Mahmood T Tanveer S (2003) Clustering time-varying gene expression profiles using scale-space signals. In: Bioinformatics conference, 2003. CSB 2003. Proceedings of the 2003 IEEE, 11–14 Aug pp 48–56

  22. Wu Fang-Xiang, Zhang Wen-Jun, Kusalik AJ (2003) Determination of the minimum sample size in microarray experiments to cluster genes using k-means clustering. In: Bioinformatics and Bioengineering, 2003. Proceedings of the 3rd IEEE symposium on, 10–12 March, pp 401 – 406

  23. Jiang D, Pei J, Zhang A (2003) DHC: a density-based hierarchical clustering method for time series gene expression data. In: Bioinformatics and Bioengineering, 2003. Proceedings of the 3rd IEEE symposium on, 10–12 March, pp 393–400

  24. Futschik ME, Kasabov NK (2002) Fuzzy clustering of gene expression data. In: Fuzzy systems, 2002. FUZZ-IEEE'02. Proceedings of the 2002 IEEe international conference on, vol 1, 12–17 May, pp 414–419

  25. Kesseli J, Ramo P, Yli-Harja O (2004) Inference of Boolean models of genetic networks using monotonic time transformations. In: Control, communications and signal processing, 2004. 1st international symposium on, 21–24 March, pp 759–762

  26. Tabus I, Giurcaneanu CD, Astola J (2004) Genetic networks inferred from time series of gene expression data. In: Control, communications and signal processing, 2004. 1st international symposium on, 21–24 March, pp 755–758

  27. Sakamoto E, Iba H (2001) Inferring a system of differential equations for a gene regulatory network by using genetic programming. In: Evolutionary computation, 2001. Proceedings of the 2001 congress on, vol 1, 27–30 May, pp 720–726

  28. Zhang L, Zhang A, Ramanathan M (2003) Fourier harmonic approach for visualizing temporal patterns of gene expression data. In: Bioinformatics conference, 2003. CSB 2003. Proceedings of the 2003 IEEE, 11–14 Aug, pp 137–147

  29. Craig P, Kennedy J, Cumming A (2002) Towards visualising temporal features in large scale microarray time-series data. In: Information visualisation, 2002. Proceedings of the 6th international conference on, 10–12 July, pp 427–433

  30. Langmead CJ, McClung CR, Donald BR (2002) A maximum entropy algorithm for rhythmic analysis of genome-wide expression patterns. In: Bioinformatics conference 2002 IEEE pp 237–245

  31. Yeang Chen-Hsiang , Jaakkola T (2003) Time series analysis of gene expression and location data. In: Bioinformatics and Bioengineering, 2003. Proceedings of the 3rd IEEE symposium on, 10–12 March, pp 305–312

  32. Wolkenhauer O (2002) Mathematical modeling in the post-genome era: understanding genome expression and regulation-a system theoretic approach. BioSystems 65:1–18

    Google Scholar 

  33. D'haeseieer P, Liang S, Somogyi R (1999) Gene expression data analysis and modeling. In: Pacific symposium on biocomputing

  34. Spellman PT et al (1998) Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 9:3273–3297

    Google Scholar 

  35. Cho RJ et al (1998) A genome-wide transcriptional analysis of the mitotic cell cycle. Mol Cell 2:65–73

    Google Scholar 

  36. Yukalov VI (2000) Self-similar extrapolation of asymptotic series and forecasting for time series. Modern Phys Lett B 14(22/23):791–900

    Google Scholar 

  37. Principe JC, Euliano NR, Curt Lefebvre W (2002) Neural and adaptive systems: fundamentals through simulations. Wiley, New York

  38. Vogl TP, Mangis JK, Rigler AK, Zink WT, Alkon DL (1988) Accelerating the convergence of the backpropagation method. Biol Cybern 59:257–263

    Google Scholar 

  39. Walter Enders (1995) Applied econometric time series. Wiley, New York

  40. Keedwell EC, Narayanan A (2002) Genetic algorithms for gene expression analysis. 1st European workshop on evolutionary bioinformatics pp 76–86

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M.K. Ng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ao, S., Ng, M. Gene expression time series modeling with principal component and neural network. Soft Comput 10, 351–358 (2006). https://doi.org/10.1007/s00500-005-0494-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-005-0494-8

Keywords

Navigation