skip to main content
10.1145/1854776.1854841acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

Using Fourier phase analysis on genomic sequences to identify retroviruses

Published: 02 August 2010 Publication History

Abstract

Retroviruses are of great importance due to their associations with diseases and their significance in understanding the evolution of species. In this paper we study the problem of classifying unknown DNA sequence fragments as retroviruses, genes or non-coding DNA sequences. We use a novel set of features generated from the Fourier transform at frequency 1/3 that are based on the amounts of randomness in sequences from these three classes and on their use of the three different possible reading frames. Our features can be computed efficiently and are used to train a random forest. It is shown that these three groups can be distinguished with high (> 90%) accuracy.

References

[1]
V. Blikstad, F. Benachenhou, G. Sperber, and J. Blomberg. Evolution of human endogenous retroviral sequences: a conceptual account. Cellular and Molecular Life Sciences, 65:3348--3365, 2008.
[2]
S. Bochkanov and V. Bystritsky. alglib, 1999--2010.
[3]
L. Breiman. Random forests. Machine Learning, 45:5--32, 2001.
[4]
S. Datta and A. Asif. A fast DNA based gene prediction algorithm for identification of protein coding regions. In International Conference on Acoustics, Speech, and Signal Processing, pages 653--656, 2005.
[5]
T. Hastie, R. Tibshirani, and J. Friedman. Elements of Statistical Learning. Springer, New York, 2009.
[6]
D. Kotlar and Y. Lavner. Gene prediction by spectral rotation measure: A new method for identifying protein-coding regions. Genome Research, 13:1930--1937, 2003.
[7]
R. Lower, J. Lower, and R. Kurth. The viruses in all of us: Characteristics and biological significance of human endogenous retrovirus sequences. Proceedings of the National Academy of Sciences USA, 93:5117--5184, 1996.
[8]
H. Masoom, S. Datta, A. Asif, L. Cunningham, and G. Wu. A fast algorithm for detecting frame shifts in DNA sequences. In CIBCB, pages 1--8, 2006.
[9]
J. Paces, A. Pavlicek, and V. Paces. HERVd: database of human endogenous retroviruses. Nucleic Acids Research, 30(1):205--6, 2002.
[10]
A. Smit, R. Hubley, and P. Green. RepeatMasker Open-3.0, 1996--2004. <http://www.repeatmasker.org>.
[11]
G. O. Sperber, T. Airola, P. Jern, and J. Blomberg. Automated recognition of retroviral sequences in genomic data -- retrotector. Nucleic Acids Research, 35:4964--4976, 2007.
[12]
J. Tuqan and A. Rushdi. A DSP approach for finding the codon bias in DNA sequences. IEEE Journal of Selected Topics in Signal Processing, 2:343--356, 2008.
[13]
H. B. Urnovitz and W. H. Murphy. Human endogenous retroviruses: nature, occurrence, and clinical implications in human disease. Clinical Microbiology Reviews, 9:72--99, 1996.
[14]
P. Villesen, L. Aagaard, C. Wiuf, and F. S. Pedersen. Identification of endogenous retroviral reading frames in the human genome. Retrovirology, 1(32):1--13, 2004.
[15]
R. A. Weiss. The discovery of endogenous retroviruses. Retrovirology, 3:67, 2006.

Cited By

View all
  • (2010)Fast algorithms for recognizing retroviruses2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)10.1109/GENSIPS.2010.5719668(1-4)Online publication date: Nov-2010

Index Terms

  1. Using Fourier phase analysis on genomic sequences to identify retroviruses

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        BCB '10: Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
        August 2010
        705 pages
        ISBN:9781450304382
        DOI:10.1145/1854776
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 02 August 2010

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Fourier phase analysis
        2. classifying sequences
        3. retroviruses

        Qualifiers

        • Research-article

        Conference

        BCB'10
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 254 of 885 submissions, 29%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 17 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2010)Fast algorithms for recognizing retroviruses2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)10.1109/GENSIPS.2010.5719668(1-4)Online publication date: Nov-2010

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media