DNA AS X: An Information-Coding-Based Model to Improve the Sensitivity in Comparative Gene Analysis

Yu, Ning; Guo, Xuan; Gu, Feng; Pan, Yi

doi:10.1007/978-3-319-19048-8_31

Ning Yu⁷,
Xuan Guo⁷,
Feng Gu⁸ &
…
Yi Pan⁷

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9096))

Included in the following conference series:

International Symposium on Bioinformatics Research and Applications

1985 Accesses
2 Citations

Abstract

In recent studies [1-3], lots of hidden homology in DNA genome are not found by current comparative tools despite decades of research. Many scholars modeled the genome as a monotonous string, which limits and probably obstructs the discovery of some significant patterns. We propose an information-coding-based model called DNA As X (DAX) to improve the sensitivity in comparative genomic studies by integrating the principles and concepts of other disciplines including information coding theory and signal processing into genome analysis. The proposed DNA As X model uses character-analysis-free (CAF) techniques, where X is the intermediate for analysis that can be digit, code, signal, vector, tree, graph network and so on. It provides novel and comprehensive perspectives to further analyze and recognize the critical patterns hidden in DNA genomes. Comparing with traditional character-analysis-based (CAB) methods, DAX not only enriches the tools and the knowledge library of computational biology but also extends the domain from 1-D character string analysis to 2-D spatial/temporal domain. Furthermore, by applying the DAX model to the issue of exon prediction as an evaluation, we illustrate the insights behind this model. The experimental results show that the DAX methodology can improve the sensitivity in genome analysis by using the novel information-coding techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Integrated entropy-based approach for analyzing exons and introns in DNA sequences

Article Open access 10 June 2019

Nucleotide distribution variance-based dynamic representation scheme for novel gene prediction

Article 27 October 2015

Computational Genomics

References

Frith, M.C.: A new repeat-masking method enables specific detection of homologous sequences. Nucleic Acids Research 39(4), e23 (2011)
Google Scholar
Frith, M.C., Noé, L.: Improved search heuristics find 20 000 new alignments between human and mouse genomes. Nucleic Acids Research 42(7), e59 (2014)
Google Scholar
Trimble, W., Keegan, K., D’Souza, M., Wilke, A., Wilkening, J., Gilbert, J., Meyer, F.: Short-read reading-frame predictors are not created equal: sequence error causes loss of signal. BMC Bioinformatics 13(1), 183 (2012)
Article Google Scholar
Djebali, S., Davis, C.A., Merkel, A., Dobin, A., Lassmann, T., Mortazavi, A.M., Schlesinger, F.: Landscape of transcription in human cells. Nature 489(7414), 101–108 (2012)
Article Google Scholar
ENCODE. An integrated encyclopedia of dna elements in the human genome. Nature 489(7414), 57–74 (September 2012)
Google Scholar
Hiller, M., Schaar, B.T., Bejerano, G.: Hundreds of conserved non-coding genomic regions are independently lost in mammals. Nucleic Acids Research (2012)
Google Scholar
Klimke, W., O’Donovan, C., White, O., Brister, J.R., Clark, K., Fedorov, B., Tatusova, T.: Solving the problem: Genome annotation standards before the data deluge. Standards in Genomic Sciences 5(1), 168–193 (2011)
Article Google Scholar
Li, H., Homer, N.: A survey of sequence alignment algorithms for next-generation sequencing. Briefings in Bioinformatics 11(5), 473–483 (2010)
Article Google Scholar
Wu, X., Cai, Z., Wan, X.-F., Hoang, T., Goebel, R., Lin, G.: Nucleotide composition string selection in HIV-1 subtyping using whole genomes. Bioinformatics 23(14), 1744–1752 (2007)
Article Google Scholar
Cai, Z., Goebel, R., Salavatipour, M., Lin, G.: Selecting dissimilar genes for multi-class classification, an application in cancer subtyping. BMC Bioinformatics 8(1), 206 (2007)
Article Google Scholar
Tesorero, R.A., Yu, N., Wright, J.O., Svencionis, J.P., Cheng, Q., Kim, J.-H., Cho, K.H.: Novel regulatory small rnas in streptococcus pyogenes. PLoS One 8(6), e64021(2013)
Google Scholar
Guo, X., Meng, Y., Yu, N., Pan, Y.: Cloud computing for detecting high-order genome-wide epistatic interaction via dynamic clustering. BMC Bioinformatics 15(1), 102 (2014)
Article Google Scholar
Yang, K., Cai, Z., Li, J., Lin, G.: A stable gene selection in microarray data analysis. BMC Bioinformatics 7(1), 228 (2006)
Article Google Scholar
Cai, Z., Duan, Y., Li, Y., Lin, G., Ozden, M., Wan, X.F.: Ipminer: a progenitor gene identifier for influenza a virus. Influenza Other Respi. Viruses 5(suppl. 1), 413–415 (2011)
Google Scholar
Silverman, B.D., Linsker, R.: A measure of DNA periodicity. Journal of Theoretical Biology 118(3), 295–300 (1986)
Article Google Scholar
Voss, R.F.: Evolution of long-range fractal correlations and 1/ f noise in dna base sequences. Phys. Rev. Lett. 68, 3805–3808 (1992)
Article Google Scholar
Cristea, P.D.: Genetic signal representation and analysis. In: Proc. SPIE, vol. 4623, pp. 77–84 (2002)
Google Scholar
Rosen, G.L.: Signal Processing for BiBiological-inspired Gradient Source Localization and DNA Sequence Analysis. PhD thesis, Georgia Institute of Technology, School of Electrical and Computer Engineering (August 2006)
Google Scholar
Chakravarthy, N., Spanias, A., Iasemidis, L.D., Tsakalis, K.: Autoregressive modeling and feature analysis of DNA sequences. EURASIP Journal on Advances in Signal Processing 2004(1), 952689 (2004)
Article Google Scholar
Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., Haussler, D.: UCSC genome browser. Genome Res 12(6), 996–1006 (2002)
Article Google Scholar
Kauer, G., Blöcker, H.: Applying signal theory to the analysis of biomolecules. Bioinformatics 19(16), 2016–2021 (2003)
Article Google Scholar
Rosen, G.L.: Examining coding structure and redundancy in DNA. IEEE Engineerin. In: Medicine and Biology Magazine, Special Issue on Communication Theory, Coding Theory, and Molecular Biology 62–68 (January/February 2006)
Google Scholar
Yoon, B.J.: Hidden markov models and their applications in biological sequence analysis. Current Genomic 10, 402–415 (2009)
Article Google Scholar
Blahut, R.E.: Algebraic Codes for Data Transmission, 2nd edn. Cambridge University Press, Cambridge (2003)
Book MATH Google Scholar
Breslauer, K.J., Frank, R.: Predicting DNA duplex stability from the base sequence. Proceedings of the National Academy of Sciences 83(11), 3746–3750 (1986)
Article Google Scholar
Crick, F.: Codon and anticodon pairing: the wobble hypothesis. Journal of Molecular Biology 19, 548–555 (1966)
Article Google Scholar
Lin, S., Costello, D.J.: Error control coding: fundamentals and applications, vol. 114. Pearson-Prentice Hall, Upper Saddle River (2004)
Google Scholar
Dubchak, I., Poliakov, A., Kislyuk, A., Brudno, M.: Multiple whole-genome alignments without a reference organism. Genome Res. 19, 682–689 (2009)
Article Google Scholar
Batzoglou, S., Pachter, L., Mesirov, J.P., Berger, B., Lander, E.S.: Human and mouse gene structure: Comparative analysis and application to exon prediction. Genome Res. 10, 950–958 (2000)
Article Google Scholar
Brudno, M., Do, C.B., Cooper, G.M., Kim, M.F., Davydov, E., Green, E.D., Sidow, A., Batzoglou, S.: LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res., 13 (April 2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Georgia State University, 25 Park Place, Atlanta, GA, 30319, USA
Ning Yu, Xuan Guo & Yi Pan
Department of Computer Science, College of Staten Island, 2800 Victory Blvd., Staten Island, NY, 10314, USA
Feng Gu

Authors

Ning Yu
View author publications
You can also search for this author in PubMed Google Scholar
Xuan Guo
View author publications
You can also search for this author in PubMed Google Scholar
Feng Gu
View author publications
You can also search for this author in PubMed Google Scholar
Yi Pan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ning Yu .

Editor information

Editors and Affiliations

Georgia State University, Atlanta, USA
Robert Harrison
Old Dominion University, Norfolk, USA
Yaohang Li
University of Connecticut, Storrs, Connecticut, USA
Ion Măndoiu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, N., Guo, X., Gu, F., Pan, Y. (2015). DNA AS X: An Information-Coding-Based Model to Improve the Sensitivity in Comparative Gene Analysis. In: Harrison, R., Li, Y., Măndoiu, I. (eds) Bioinformatics Research and Applications. ISBRA 2015. Lecture Notes in Computer Science(), vol 9096. Springer, Cham. https://doi.org/10.1007/978-3-319-19048-8_31

Download citation

DOI: https://doi.org/10.1007/978-3-319-19048-8_31
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19047-1
Online ISBN: 978-3-319-19048-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

DNA AS X: An Information-Coding-Based Model to Improve the Sensitivity in Comparative Gene Analysis

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Integrated entropy-based approach for analyzing exons and introns in DNA sequences

Nucleotide distribution variance-based dynamic representation scheme for novel gene prediction

Computational Genomics

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

DNA AS X: An Information-Coding-Based Model to Improve the Sensitivity in Comparative Gene Analysis

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Integrated entropy-based approach for analyzing exons and introns in DNA sequences

Nucleotide distribution variance-based dynamic representation scheme for novel gene prediction

Computational Genomics

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation