Abstract
We use principal component analysis (PCA) to identify exons of a gene and further analyze their internal structures. The PCA is conducted on the short-time Fourier transform (STFT) based on the 64 codon sequences and the 4 nucleotide sequences. By comparing to independent component analysis (ICA), we can differentiate between the exon and intron regions, and how they are correlated in terms of the square magnitudes of STFTs. The experiment is done on the gene F56F11.4 in the chromosome III of C. elegans. For this data, the nucleotide based PCA identifies the exon and intron regions clearly. The codon based PCA reveals a weak internal structure in some exon regions, but not the others. The result of ICA shows that the nucleotides thymine (T) and guanine (G) have almost all the information of the exon and intron regions for this data. We hypothesize the existence of complex exon structures that deserve more detailed analysis.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Anastassiou, D.: Frequency-domain analysis of biomolecular sequences. Bioinformatics 16, 1073–1081 (2000)
Beyerbach, D., Nawab, H.: Principal components analysis of the short-time Fourier transform. In: Proc. IEEE ICASSP, pp. 1725–1728 (1991)
Bingham, E., Hyvarinen, A.: A fast fixed-point algorithm for independent component analysis of complex-valued signals. Int. J. of Neural Systems 10(1), 1–8 (2000)
Fickett, J.W.: Recognition of protein coding regions in DNA sequences. Nucleic Acids Research 10, 5303–5318 (1982)
Fodor, I.K., Kamath, C.: Dimension reduction in the atmospheric sciences, Computing in Science and Engineering special issue on High Dimensional Data, submitted, UCRL-JC-146972 (2002)
Guigo, R.: DNA composition, codon usage and exon prediction. In: Bishop, M.J. (ed.) Genetic Databases. Academic Press, London (1999)
Jollife, I.T.: Principal component analysis. Springer, New York (2002)
Li, W., Marr, T.G., Kaneko, K.: Understanding long-range correlations in DNA sequences. Physica D 75, 392–416 (1994)
Mitra, S.K.: Digital Signal Processing: A Computer-Based Approach, 2nd edn. McGraw-Hill, New York (2000)
Salzberg, S.L.: Locating protein coding regions in human DNA using a decision tree algorithm. Journal of Computational Biology 2(3), 473–485 (1995)
Silverman, B.D., Linsker, R.: A measure of DNA periodicity. J. Theor. Biol. 118, 295–300 (1986)
Snyder, E.E., Stormo, G.D.: Identification of protein coding regions in genomic DNA. Journal of Molecular Biology 248, 1–18 (1995)
Tiwari, S., Ramachandran, S., Bhattacharya, A., Bhattacharya, S., Ramaswamy, R.: Prediction of probable genes by Fourier analysis of genomic sequences. CABIOS 113, 263–270 (1997)
Vaidyanathan, P.P., Yoon, B.: Gene and exon prediction using allpass-based filters. In: Workshop on Genomic Signal Processing and Statistics (GENSIPS), Raleigh, NC (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hwang, C., Chiu, D., Sohn, I. (2005). Exon Structure Analysis via PCA and ICA of Short-Time Fourier Transform. In: Wang, L., Chen, K., Ong, Y.S. (eds) Advances in Natural Computation. ICNC 2005. Lecture Notes in Computer Science, vol 3611. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11539117_45
Download citation
DOI: https://doi.org/10.1007/11539117_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28325-6
Online ISBN: 978-3-540-31858-3
eBook Packages: Computer ScienceComputer Science (R0)