Skip to main content
Log in

A tri-nucleotide mapping scheme based on residual volume of amino acids for short length exon prediction using sliding window DFT method

  • Original Article
  • Published:
Network Modeling Analysis in Health Informatics and Bioinformatics Aims and scope Submit manuscript

Abstract

One of the great challenges in the field of bioinformatics is how to locate the accurate protein-coding regions in a given DNA sequence. The accurate identification of the protein-coding region is useful in many applications. For instance; it helps in characterizing new proteins, drug designing, and also in revealing the evolutionary background of a particular organism. DSP based techniques are quite popular in the protein-coding region identification. The first essential step of the DSP based exon prediction technique is to convert the base sequences into the numerical sequence. The choice of the numerical mapping scheme affects how well the characteristic feature of the DNA sequence is reflected in the numerical domain which helps in finding the accurate location of exons. In the last two decades, numbers of mapping schemes have been successfully used for exon prediction. However, locating the short length exon is still a difficult task. In this paper, we have proposed a tri-nucleotide mapping scheme that exploits the residual volume property of amino acid to encode the given DNA sequence. It is obtained that the proposed tri-nucleotide mapping scheme provides better results than other mapping schemes in case of short length exon detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Abbreviations

TBP:

Three base periodicity

STDFT:

Short time discrete Fourier transform

bp:

Base pair

TP:

True positive

FP:

False positive

TN:

True negative

FN:

False negative

DM:

Discrimination measure

EIIP:

Electron ion interaction potential

PN:

Paired numeric

References

  • Abbasi O, Rostami A, Karimian G (2011) Identification of exonic regions in DNA sequences using cross-correlation and noise suppression by discrete wavelet transform. BMC Bioinformatics 12(430):1–10

    Google Scholar 

  • Ahmad M, Jung LT, Bhuiyan A (2017) From DNA to protein: why genetic code context of nucleotides for DNA signal processing? A review. Biomed Signal Process Control 34:44–63

    Article  Google Scholar 

  • Akhtar M, Epps J, Ambikairajah E (2007) On DNA numerical representations for period-3 based exon prediction. In: IEEE international workshop on genomic signal processing and statistics (GENSIPS), pp 1–4

  • Akhtar M, Epps J, Ambikairajah E (2008) Signal processing in sequence analysis: advances in eukaryotic gene prediction. IEEE J Sel Top signal Process 2(3):310–321

    Article  Google Scholar 

  • Anastassiou D (2001) Genomic signal processing. IEEE Signal Process Mag 18(4):8–20

    Article  Google Scholar 

  • Arniker SB, Kwan HK (2012) Advanced numerical representation of DNA sequences. In: International conference on bioscience, biochemistry and bioinformatics, IPCBEE

  • Arora R, Sethares WA (2008) Latent periodicities in genomic sequences. IEEE J Sel Top Signal Process 2(3):332–342

    Article  Google Scholar 

  • Guigo R (1999) DNA composition, codon usage and exon prediction Genetic Databases. Academic Press, Cambridge

  • Hota MK, Srivastava VK (2008) DSP technique for gene and exon prediction taking complex indicator sequence. Proc IEEE TENCON 2008:1–6

    Google Scholar 

  • Koltar D, Lavner Y (2003) Gene prediction by spectral rotation measure: a new method for identifying protein-coding regions. Genome Res 13:1930–1937

    Google Scholar 

  • Kwan HK, Kwan BYM, Kwan JJY (2012) Novel methodologies for spectral classification of exon and intron sequences. EURASIP J Adv Signal Process 50(1):1–14

    MathSciNet  Google Scholar 

  • Meher JK, Dash GN, Meher PK, Raval MK (2011a) A reduced computational load protein coding predictor using equivalent amino acid sequence of DNA string with period-3 based time and frequency domain analysis. Am J Mol Biol 1:79–86

    Article  Google Scholar 

  • Meher J, Meher PK, Das G (2011b) Improved comb filter based approach for effective prediction of protein coding regions in DNA sequences. J Signal Inf Process 2:88–99

    Google Scholar 

  • Mena-Chalco JP, Carrer H, Zana Y, Cesar RM Jr (2008) Identification of protein coding regions using the modified Gabor-wavelet transform. IEEE/ACM Trans Comput Biol Bioinf 5:198–207

    Article  Google Scholar 

  • Nair AS, Sreenadhan SP (2006) A coding measure scheme employing electron-ion interaction pseudopotantial (EIIP). Bioinformation 1(6):197–202

    Google Scholar 

  • Provazník I, Kubicová V, Škutková H, Tkacz E, Babula P (2012) Detection of Short Exons in DNA sequences using complex wavelet transform of structural features. In: 2012 International workshop on genomic signal processing and statistics (GENSIPS), pp 107–110, Washington, DC

  • Ramachandran P, Antoniou A (2008) Identification of hot spot locations in proteins using digital filter. IEEE J Sel Top Signal Process 2(3):378–389

    Article  Google Scholar 

  • Rao N, Lei X, Guo J, Huang H, Ren Z (2009) An efficient sliding window strategy for accurate location of eukaryotic protein coding regions. Comput Biol Med 39:392–395

    Article  Google Scholar 

  • Rogic S, Mackworth AK, Ouellette FBF (2001) Evaluation of gene finding programs on mammalian sequences. Genome Res 11(5):817–832

    Article  Google Scholar 

  • Roy M, Barman S (2014) Effective gene prediction by high resolution frequency estimator based on least-norm solution technique. EURASIP J Bioinf Syst Biol 2(1):1–13

    Google Scholar 

  • Roy M, Barman S (2016) Improved gene prediction by principal component analysis based autoregressive Yule-Walker method. Gene 575:488–497

    Article  Google Scholar 

  • Shakya DK, Saxena R, Sharma SN (2013) An adaptive window length strategy for eukaryotic CDS prediction. IEEE/ACM Trans Comput Biol Bioinf 10(5):1241–1252

    Article  Google Scholar 

  • Silverman BD, Linsker R (1986) A measure of DNA periodicity. J Theor Biol 118:295–300

    Article  Google Scholar 

  • Singh AK, Srivastava VK (2019) Performance evaluation of different window functions for STDFT based exon prediction technique taking paired numeric mapping scheme. In: 6th International Conference on Signal Processing and Integrated Networks (SPIN), pp 1–5

  • Tiwari S, Ramachandran R, Bhattacharya A, Bhattacharya S, Ramaswamy R (1997) Prediction of probable genes by Fourier analysis of genomic sequences. CABIOS 13(3):263–270

    Google Scholar 

  • Trifonov (1998) E.N.: 3-, 10.5- and 400-base periodicities in genome sequences. Phys A 249:511–516

    Article  Google Scholar 

  • Vaidyanathan PP, Yoon BJ (2004) The role of signal-processing concepts in genomics and proteomics. J Franklin Inst 341:111–1354

    Article  MATH  Google Scholar 

  • Voss RF (1992) Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. Phys Rev Lett 68:3805–3808

    Article  Google Scholar 

  • Zhang S, Wang T (2009) Feature analysis of protein structure by using discrete Fourier transform and continuous wavelet transform. J Math Chem 46:562–568

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang W, Yan H (2011) Exon prediction using empirical mode decomposition and Fourier transform of structural profiles of DNA sequences. Pattern Recogn 45:947–955

    Article  Google Scholar 

  • Zhang R, Zhang CT (1994) Z curves an intuitive tool for visualizing and analyzing the DNA sequences. J Biomol Struct Dyn 11(4):767–782

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amit Kumar Singh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Singh, A.K., Srivastava, V.K. A tri-nucleotide mapping scheme based on residual volume of amino acids for short length exon prediction using sliding window DFT method. Netw Model Anal Health Inform Bioinforma 9, 26 (2020). https://doi.org/10.1007/s13721-020-00230-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13721-020-00230-1

Keywords

Navigation