Abstract
In this work, gene expression time series models have been constructed by using principal component analysis (PCA) and neural network (NN). The main contribution of this paper is to develop a methodology for modeling numerical gene expression time series. The PCA-NN prediction models are compared with other popular continuous prediction methods. The proposed model can give us the extracted features from the gene expressions time series and the orders of the prediction accuracies. Therefore, the model can help practitioners to gain a better understanding of a cell cycle, and to find the dependency of genes, which is useful for drug discoveries. Based on the results of two public real datasets, the PCA-NN method outperforms the other continuous prediction methods. In the time series model, we adapt Akaike's information criteria (AIC) tests and cross-validation to select a suitable NN model to avoid the overparameterized problem.
Similar content being viewed by others
References
Causton HC, Quackenbush J, Brazma A (2003) Microarray gene expression data analysis: a beginner's guide. Blackwell, Oxford
Khan J et al (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7(6):673–679
Nikkilä J, Törönen P, Kaski S, Venna J, Castrén E, Wong G (2002) Analysis and visualization of gene expression data using self-organizing maps. Neural Netw 15(8–9):953–966
Wu W, Wildsmith SE, Winkley AJ, Yallop R, Elcock FJ, Bugelski PJ (2001) Chemometric strategies for normalisation of gene expression data obtained from cDNA microarrays. Analytica Chimica Acta 446(1–2):449–464
Ji X, Li-Ling J, Sun Z (2003) Mining gene expression data using a novel approach based on hidden Markov models. FEBS Lett 542:124–131
Aach J, Church GM (2001) Aligning gene expression time series with time warping algorithms. Bioinformatics 17(6):495–508
Butte AJ et al (2001) Comparing the similarity of time-series gene expression using signal processing metrics. J Biomed Inform 34:396–405
Chen T, Filkov V, Skiena SS (2001) Identifying gene regulatory networks from experimental data. Parallel Comput 27:141–162
Dewey TG (2002) From microarrays to networks: mining expression time series. Information Biotechnol (suppl Drug Discov Today) 7(20):170–175
Hornquist M, Hertz J, Wahde M (2003) Effective dimensionality of large-scale expression data using principal component analysis. BioSystem 65:147–156
Bicciato S, Luchini A, Di Bello C (2003) PCA disjoint models for multiclass cancer analysis using gene expression data. Bioinformatics 19(5):571–578
Taylor J, King RD, Altmann T, Fiehn O (2002) Application of metabolomics to plant genotype discrimination using statistics and machine learning. Bioinformatics 18(suppl 2):241–248
Yeung KY, Ruzzo WL (2001) Principal component analysis for clustering gene expression data. Bioinformatics 17(9):763–774
Herrero J, Valencia A, Dopzao J (2001) A hierarchical unsupervised growing neural network for clustering gene expression patterns. Bioinformatics 17(2):126–136
Peterson C, Ringner M (2003) Analyzing tumor gene expression profiles. Artif Intell Med 28:59–74
Sawa T (2003) A neural network-based similarity index for clustering dna microarray data. Comput Biol Med 33:1–15
Spicker JS et al (2002) Neural network predicts sequence of TP53 gene based on DNA chip. Bioinformatics 18(8):1133–1134
Costa IG de Carvalho FdeAT, de Souto MCP (2002) A symbolic approach to gene expression time series analysis. Neural Networks 2002 Brazilian symposium, pp 25–30
Yoshioka T, Ishii S (2002) Clustering for time-series gene expression data using mixture of constrained PCAS. Neural information processing, ICONIP '02, pp 2239–2243 (v5)
Tabus I, Astola J (2003) Clustering the non-uniformly sampled time series of gene expression data. In: Signal processing and its applications, 2003. Proceedings of the 7th international symposium on, vol 2, 1–4 July, pp 61–64
Syeda-Mahmood T Tanveer S (2003) Clustering time-varying gene expression profiles using scale-space signals. In: Bioinformatics conference, 2003. CSB 2003. Proceedings of the 2003 IEEE, 11–14 Aug pp 48–56
Wu Fang-Xiang, Zhang Wen-Jun, Kusalik AJ (2003) Determination of the minimum sample size in microarray experiments to cluster genes using k-means clustering. In: Bioinformatics and Bioengineering, 2003. Proceedings of the 3rd IEEE symposium on, 10–12 March, pp 401 – 406
Jiang D, Pei J, Zhang A (2003) DHC: a density-based hierarchical clustering method for time series gene expression data. In: Bioinformatics and Bioengineering, 2003. Proceedings of the 3rd IEEE symposium on, 10–12 March, pp 393–400
Futschik ME, Kasabov NK (2002) Fuzzy clustering of gene expression data. In: Fuzzy systems, 2002. FUZZ-IEEE'02. Proceedings of the 2002 IEEe international conference on, vol 1, 12–17 May, pp 414–419
Kesseli J, Ramo P, Yli-Harja O (2004) Inference of Boolean models of genetic networks using monotonic time transformations. In: Control, communications and signal processing, 2004. 1st international symposium on, 21–24 March, pp 759–762
Tabus I, Giurcaneanu CD, Astola J (2004) Genetic networks inferred from time series of gene expression data. In: Control, communications and signal processing, 2004. 1st international symposium on, 21–24 March, pp 755–758
Sakamoto E, Iba H (2001) Inferring a system of differential equations for a gene regulatory network by using genetic programming. In: Evolutionary computation, 2001. Proceedings of the 2001 congress on, vol 1, 27–30 May, pp 720–726
Zhang L, Zhang A, Ramanathan M (2003) Fourier harmonic approach for visualizing temporal patterns of gene expression data. In: Bioinformatics conference, 2003. CSB 2003. Proceedings of the 2003 IEEE, 11–14 Aug, pp 137–147
Craig P, Kennedy J, Cumming A (2002) Towards visualising temporal features in large scale microarray time-series data. In: Information visualisation, 2002. Proceedings of the 6th international conference on, 10–12 July, pp 427–433
Langmead CJ, McClung CR, Donald BR (2002) A maximum entropy algorithm for rhythmic analysis of genome-wide expression patterns. In: Bioinformatics conference 2002 IEEE pp 237–245
Yeang Chen-Hsiang , Jaakkola T (2003) Time series analysis of gene expression and location data. In: Bioinformatics and Bioengineering, 2003. Proceedings of the 3rd IEEE symposium on, 10–12 March, pp 305–312
Wolkenhauer O (2002) Mathematical modeling in the post-genome era: understanding genome expression and regulation-a system theoretic approach. BioSystems 65:1–18
D'haeseieer P, Liang S, Somogyi R (1999) Gene expression data analysis and modeling. In: Pacific symposium on biocomputing
Spellman PT et al (1998) Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 9:3273–3297
Cho RJ et al (1998) A genome-wide transcriptional analysis of the mitotic cell cycle. Mol Cell 2:65–73
Yukalov VI (2000) Self-similar extrapolation of asymptotic series and forecasting for time series. Modern Phys Lett B 14(22/23):791–900
Principe JC, Euliano NR, Curt Lefebvre W (2002) Neural and adaptive systems: fundamentals through simulations. Wiley, New York
Vogl TP, Mangis JK, Rigler AK, Zink WT, Alkon DL (1988) Accelerating the convergence of the backpropagation method. Biol Cybern 59:257–263
Walter Enders (1995) Applied econometric time series. Wiley, New York
Keedwell EC, Narayanan A (2002) Genetic algorithms for gene expression analysis. 1st European workshop on evolutionary bioinformatics pp 76–86
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ao, S., Ng, M. Gene expression time series modeling with principal component and neural network. Soft Comput 10, 351–358 (2006). https://doi.org/10.1007/s00500-005-0494-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-005-0494-8