Abstract
Identification of exons in eukaryotes using DSP techniques is a challenging task in genomic signal processing owing to the low density of coding regions. Although many DSP techniques have been proposed, still fast and accurate identification of exons is a great challenge. In this paper, an empirical mode decomposition (EMD) based adaptive noise canceller (ANC) along with a zero-phase anti-notch filter is proposed for improved identification of exons. An anti-notch filter extracts the period-3 property present in the exons and generates the feature, whereas the EMD-based ANC can remove the 1/f background noise present in the feature. The potential of the proposed technique is analyzed in comparison with other state-of-the-art methods at the nucleotide level using statistical features such as receiver operating characteristic curve, sensitivity, specificity, approximate correlation, and correlation coefficient. The proposed EMD-based ANC technique outperforms other discussed methods when applied to benchmark databases.









Similar content being viewed by others

Data availability statement
Dataset will be provided on request to malayakumar.h@vit.ac.in.
References
Abbasi O, Rostami A, Karimian G (2011) Identification of exonic regions in DNA sequences using cross-correlation and noise suppression by discrete wavelet transform. BMC Bioinform 12(430):1–10
Ahmad M, Jung LT, Bhuiyan A-A (2017) A biological inspired fuzzy adaptive window median filter (FAWMF) for enhancing DNA signal processing. Comput Methods Prog Biomed 149:11–17
Akhtar M, Epps J, Ambikairajah E (2008) Signal processing in sequence analysis: advances in eukaryotic gene prediction. IEEE Sel Top Signal Process 2(3):310–321
Anastassiou D (2001) Genomic signal processing. IEEE Signal Process Mag 18(4):8–20
Awad MM et al (2016) MET exon 14 mutations in non-smell-cell lung cancer are associated with advanced age and stage-dependent MET genomic amplification and c-Met overexpression. J Clin Oncol 34(7):721–730
Burge C (1997) Identification of genes in human genomic DNA, Ph.D. dissertation, Stanford Univ., Stanford, CA
Burset M, Guigo R (1996) Evaluation of gene structure prediction programs. Genomics 34(3):353–367
Das L, Nanda S, Das JK (2019) An integrated approach for identification of exon locations using recursive Gauss Newton tuned adaptive Kaiser window. Genomics 111(3):284–296
George TP, Thomas TP (2010) Discrete wavelet transform de-noising in eukaryotic gene splicing. BMC Bioinform 11(S50):1–8
Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristics (roc) curve. Radiology 143(1):29–36
Hayes MH (1996) Statistical digital signal processing and modelling. John Wiley & Sons, New York
Hota MK, Srivastava VK (2012a) Identification of protein coding regions using antinotch filters. Digit Signal Process 22(6):869–877
Hota MK, Srivastava VK (2012b) Multistage filters for identification of eukaryotic protein coding regions. Int J Biomath 5(2):1–18
Hota MK, Srivastava VK (2017) A multirate DSP structure for the identification of protein-coding regions. Int J Biomath 10(8):1–15
Huang NE, Shen Z, Long S, Wu MC, Shih HH, Zheng Q, Yen N-C (1998) The empirical mode decomposition and the Hilbert spectrum for non-linear and non-stationary time series analysis. Proc R Soc Lond A 454(1971):903–995
Marhon SA, Kremer SC (2016) Prediction of protein coding regions using a wide-range wavelet window method. IEEE/ACM Trans Comput Biol Bioinform 13(4):742–753
Mena-Chalco J, Carrer H, Zana Y, Cesar RM Jr (2008) Identification of protein coding regions using the modified Gabor-wavelet transform. IEEE/ACM Trans Comput Biol Bioinform 5(2):198–207
Rogic S, Mackworth AK, Ouellette FBF (2001) Evaluation of gene-finding programs on mammalian sequences. Genome Res 11(5):817–832
Sahu SS, Panda G (2011) Identification of protein-coding regions in DNA sequences using a time-frequency filtering approach. Genomics Proteomics Bioinform 9(1):45–55
Shakya DK, Saxana R, Sharma SN (2013) An adaptive window length strategy for eukaryotic CDS prediction. IEEE/ACM Trans Comput Biol Bioinform 10(5):1241–1252
Singh AK, Srivastava VK (2021) Improved filtering approach for identification of protein-coding regions by background noise reduction using S-G filter 10(1):1–16
Tiwari S, Ramachandran S, Bhattacharya A, Bhattacharya S, Ramaswamy R (1997) Prediction of probable genes by Fourier analysis of genomic sequences. CABIOS 13(3):263–270
Tuqan J, Rushdi A (2008) A DSP approach for finding the codon bias in DNA sequences. IEEE Sel Top Signal Process 2(3):343–356
Vaidyanathan PP, Yoon B-J (2004) The role of signal-processing concepts in genomics and proteomics. J Frankl Inst 341(1):111–135
Voss RF (1992) Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. Phys Rev Lett 68(25):3805
Zhang W, Yan H (2011) Exon prediction using empirical mode decomposition and Fourier transform of structural profiles of DNA sequence. Pattern Recogn 45:947–955
Acknowledgements
The author would like to thank the editor-in-chief and four anonymous reviewers for their constructive comments that improved the manuscript greatly.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hota, M.K. Empirical mode decomposition based adaptive noise canceller for improved identification of exons in eukaryotes. Netw Model Anal Health Inform Bioinforma 10, 60 (2021). https://doi.org/10.1007/s13721-021-00346-y
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13721-021-00346-y