Abstract
One of the great challenges in the field of bioinformatics is how to locate the accurate protein-coding regions in a given DNA sequence. The accurate identification of the protein-coding region is useful in many applications. For instance; it helps in characterizing new proteins, drug designing, and also in revealing the evolutionary background of a particular organism. DSP based techniques are quite popular in the protein-coding region identification. The first essential step of the DSP based exon prediction technique is to convert the base sequences into the numerical sequence. The choice of the numerical mapping scheme affects how well the characteristic feature of the DNA sequence is reflected in the numerical domain which helps in finding the accurate location of exons. In the last two decades, numbers of mapping schemes have been successfully used for exon prediction. However, locating the short length exon is still a difficult task. In this paper, we have proposed a tri-nucleotide mapping scheme that exploits the residual volume property of amino acid to encode the given DNA sequence. It is obtained that the proposed tri-nucleotide mapping scheme provides better results than other mapping schemes in case of short length exon detection.












Similar content being viewed by others
Abbreviations
- TBP:
-
Three base periodicity
- STDFT:
-
Short time discrete Fourier transform
- bp:
-
Base pair
- TP:
-
True positive
- FP:
-
False positive
- TN:
-
True negative
- FN:
-
False negative
- DM:
-
Discrimination measure
- EIIP:
-
Electron ion interaction potential
- PN:
-
Paired numeric
References
Abbasi O, Rostami A, Karimian G (2011) Identification of exonic regions in DNA sequences using cross-correlation and noise suppression by discrete wavelet transform. BMC Bioinformatics 12(430):1–10
Ahmad M, Jung LT, Bhuiyan A (2017) From DNA to protein: why genetic code context of nucleotides for DNA signal processing? A review. Biomed Signal Process Control 34:44–63
Akhtar M, Epps J, Ambikairajah E (2007) On DNA numerical representations for period-3 based exon prediction. In: IEEE international workshop on genomic signal processing and statistics (GENSIPS), pp 1–4
Akhtar M, Epps J, Ambikairajah E (2008) Signal processing in sequence analysis: advances in eukaryotic gene prediction. IEEE J Sel Top signal Process 2(3):310–321
Anastassiou D (2001) Genomic signal processing. IEEE Signal Process Mag 18(4):8–20
Arniker SB, Kwan HK (2012) Advanced numerical representation of DNA sequences. In: International conference on bioscience, biochemistry and bioinformatics, IPCBEE
Arora R, Sethares WA (2008) Latent periodicities in genomic sequences. IEEE J Sel Top Signal Process 2(3):332–342
Guigo R (1999) DNA composition, codon usage and exon prediction Genetic Databases. Academic Press, Cambridge
Hota MK, Srivastava VK (2008) DSP technique for gene and exon prediction taking complex indicator sequence. Proc IEEE TENCON 2008:1–6
Koltar D, Lavner Y (2003) Gene prediction by spectral rotation measure: a new method for identifying protein-coding regions. Genome Res 13:1930–1937
Kwan HK, Kwan BYM, Kwan JJY (2012) Novel methodologies for spectral classification of exon and intron sequences. EURASIP J Adv Signal Process 50(1):1–14
Meher JK, Dash GN, Meher PK, Raval MK (2011a) A reduced computational load protein coding predictor using equivalent amino acid sequence of DNA string with period-3 based time and frequency domain analysis. Am J Mol Biol 1:79–86
Meher J, Meher PK, Das G (2011b) Improved comb filter based approach for effective prediction of protein coding regions in DNA sequences. J Signal Inf Process 2:88–99
Mena-Chalco JP, Carrer H, Zana Y, Cesar RM Jr (2008) Identification of protein coding regions using the modified Gabor-wavelet transform. IEEE/ACM Trans Comput Biol Bioinf 5:198–207
Nair AS, Sreenadhan SP (2006) A coding measure scheme employing electron-ion interaction pseudopotantial (EIIP). Bioinformation 1(6):197–202
Provazník I, Kubicová V, Škutková H, Tkacz E, Babula P (2012) Detection of Short Exons in DNA sequences using complex wavelet transform of structural features. In: 2012 International workshop on genomic signal processing and statistics (GENSIPS), pp 107–110, Washington, DC
Ramachandran P, Antoniou A (2008) Identification of hot spot locations in proteins using digital filter. IEEE J Sel Top Signal Process 2(3):378–389
Rao N, Lei X, Guo J, Huang H, Ren Z (2009) An efficient sliding window strategy for accurate location of eukaryotic protein coding regions. Comput Biol Med 39:392–395
Rogic S, Mackworth AK, Ouellette FBF (2001) Evaluation of gene finding programs on mammalian sequences. Genome Res 11(5):817–832
Roy M, Barman S (2014) Effective gene prediction by high resolution frequency estimator based on least-norm solution technique. EURASIP J Bioinf Syst Biol 2(1):1–13
Roy M, Barman S (2016) Improved gene prediction by principal component analysis based autoregressive Yule-Walker method. Gene 575:488–497
Shakya DK, Saxena R, Sharma SN (2013) An adaptive window length strategy for eukaryotic CDS prediction. IEEE/ACM Trans Comput Biol Bioinf 10(5):1241–1252
Silverman BD, Linsker R (1986) A measure of DNA periodicity. J Theor Biol 118:295–300
Singh AK, Srivastava VK (2019) Performance evaluation of different window functions for STDFT based exon prediction technique taking paired numeric mapping scheme. In: 6th International Conference on Signal Processing and Integrated Networks (SPIN), pp 1–5
Tiwari S, Ramachandran R, Bhattacharya A, Bhattacharya S, Ramaswamy R (1997) Prediction of probable genes by Fourier analysis of genomic sequences. CABIOS 13(3):263–270
Trifonov (1998) E.N.: 3-, 10.5- and 400-base periodicities in genome sequences. Phys A 249:511–516
Vaidyanathan PP, Yoon BJ (2004) The role of signal-processing concepts in genomics and proteomics. J Franklin Inst 341:111–1354
Voss RF (1992) Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. Phys Rev Lett 68:3805–3808
Zhang S, Wang T (2009) Feature analysis of protein structure by using discrete Fourier transform and continuous wavelet transform. J Math Chem 46:562–568
Zhang W, Yan H (2011) Exon prediction using empirical mode decomposition and Fourier transform of structural profiles of DNA sequences. Pattern Recogn 45:947–955
Zhang R, Zhang CT (1994) Z curves an intuitive tool for visualizing and analyzing the DNA sequences. J Biomol Struct Dyn 11(4):767–782
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Singh, A.K., Srivastava, V.K. A tri-nucleotide mapping scheme based on residual volume of amino acids for short length exon prediction using sliding window DFT method. Netw Model Anal Health Inform Bioinforma 9, 26 (2020). https://doi.org/10.1007/s13721-020-00230-1
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13721-020-00230-1