Skip to main content

DNA AS X: An Information-Coding-Based Model to Improve the Sensitivity in Comparative Gene Analysis

  • Conference paper
Bioinformatics Research and Applications (ISBRA 2015)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9096))

Included in the following conference series:

Abstract

In recent studies [1-3], lots of hidden homology in DNA genome are not found by current comparative tools despite decades of research. Many scholars modeled the genome as a monotonous string, which limits and probably obstructs the discovery of some significant patterns. We propose an information-coding-based model called DNA As X (DAX) to improve the sensitivity in comparative genomic studies by integrating the principles and concepts of other disciplines including information coding theory and signal processing into genome analysis. The proposed DNA As X model uses character-analysis-free (CAF) techniques, where X is the intermediate for analysis that can be digit, code, signal, vector, tree, graph network and so on. It provides novel and comprehensive perspectives to further analyze and recognize the critical patterns hidden in DNA genomes. Comparing with traditional character-analysis-based (CAB) methods, DAX not only enriches the tools and the knowledge library of computational biology but also extends the domain from 1-D character string analysis to 2-D spatial/temporal domain. Furthermore, by applying the DAX model to the issue of exon prediction as an evaluation, we illustrate the insights behind this model. The experimental results show that the DAX methodology can improve the sensitivity in genome analysis by using the novel information-coding techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Frith, M.C.: A new repeat-masking method enables specific detection of homologous sequences. Nucleic Acids Research 39(4), e23 (2011)

    Google Scholar 

  2. Frith, M.C., Noé, L.: Improved search heuristics find 20 000 new alignments between human and mouse genomes. Nucleic Acids Research 42(7), e59 (2014)

    Google Scholar 

  3. Trimble, W., Keegan, K., D’Souza, M., Wilke, A., Wilkening, J., Gilbert, J., Meyer, F.: Short-read reading-frame predictors are not created equal: sequence error causes loss of signal. BMC Bioinformatics 13(1), 183 (2012)

    Article  Google Scholar 

  4. Djebali, S., Davis, C.A., Merkel, A., Dobin, A., Lassmann, T., Mortazavi, A.M., Schlesinger, F.: Landscape of transcription in human cells. Nature 489(7414), 101–108 (2012)

    Article  Google Scholar 

  5. ENCODE. An integrated encyclopedia of dna elements in the human genome. Nature 489(7414), 57–74 (September 2012)

    Google Scholar 

  6. Hiller, M., Schaar, B.T., Bejerano, G.: Hundreds of conserved non-coding genomic regions are independently lost in mammals. Nucleic Acids Research (2012)

    Google Scholar 

  7. Klimke, W., O’Donovan, C., White, O., Brister, J.R., Clark, K., Fedorov, B., Tatusova, T.: Solving the problem: Genome annotation standards before the data deluge. Standards in Genomic Sciences 5(1), 168–193 (2011)

    Article  Google Scholar 

  8. Li, H., Homer, N.: A survey of sequence alignment algorithms for next-generation sequencing. Briefings in Bioinformatics 11(5), 473–483 (2010)

    Article  Google Scholar 

  9. Wu, X., Cai, Z., Wan, X.-F., Hoang, T., Goebel, R., Lin, G.: Nucleotide composition string selection in HIV-1 subtyping using whole genomes. Bioinformatics 23(14), 1744–1752 (2007)

    Article  Google Scholar 

  10. Cai, Z., Goebel, R., Salavatipour, M., Lin, G.: Selecting dissimilar genes for multi-class classification, an application in cancer subtyping. BMC Bioinformatics 8(1), 206 (2007)

    Article  Google Scholar 

  11. Tesorero, R.A., Yu, N., Wright, J.O., Svencionis, J.P., Cheng, Q., Kim, J.-H., Cho, K.H.: Novel regulatory small rnas in streptococcus pyogenes. PLoS One 8(6), e64021(2013)

    Google Scholar 

  12. Guo, X., Meng, Y., Yu, N., Pan, Y.: Cloud computing for detecting high-order genome-wide epistatic interaction via dynamic clustering. BMC Bioinformatics 15(1), 102 (2014)

    Article  Google Scholar 

  13. Yang, K., Cai, Z., Li, J., Lin, G.: A stable gene selection in microarray data analysis. BMC Bioinformatics 7(1), 228 (2006)

    Article  Google Scholar 

  14. Cai, Z., Duan, Y., Li, Y., Lin, G., Ozden, M., Wan, X.F.: Ipminer: a progenitor gene identifier for influenza a virus. Influenza Other Respi. Viruses 5(suppl. 1), 413–415 (2011)

    Google Scholar 

  15. Silverman, B.D., Linsker, R.: A measure of DNA periodicity. Journal of Theoretical Biology 118(3), 295–300 (1986)

    Article  Google Scholar 

  16. Voss, R.F.: Evolution of long-range fractal correlations and 1/ f noise in dna base sequences. Phys. Rev. Lett. 68, 3805–3808 (1992)

    Article  Google Scholar 

  17. Cristea, P.D.: Genetic signal representation and analysis. In: Proc. SPIE, vol. 4623, pp. 77–84 (2002)

    Google Scholar 

  18. Rosen, G.L.: Signal Processing for BiBiological-inspired Gradient Source Localization and DNA Sequence Analysis. PhD thesis, Georgia Institute of Technology, School of Electrical and Computer Engineering (August 2006)

    Google Scholar 

  19. Chakravarthy, N., Spanias, A., Iasemidis, L.D., Tsakalis, K.: Autoregressive modeling and feature analysis of DNA sequences. EURASIP Journal on Advances in Signal Processing 2004(1), 952689 (2004)

    Article  Google Scholar 

  20. Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., Haussler, D.: UCSC genome browser. Genome Res 12(6), 996–1006 (2002)

    Article  Google Scholar 

  21. Kauer, G., Blöcker, H.: Applying signal theory to the analysis of biomolecules. Bioinformatics 19(16), 2016–2021 (2003)

    Article  Google Scholar 

  22. Rosen, G.L.: Examining coding structure and redundancy in DNA. IEEE Engineerin. In: Medicine and Biology Magazine, Special Issue on Communication Theory, Coding Theory, and Molecular Biology 62–68 (January/February 2006)

    Google Scholar 

  23. Yoon, B.J.: Hidden markov models and their applications in biological sequence analysis. Current Genomic 10, 402–415 (2009)

    Article  Google Scholar 

  24. Blahut, R.E.: Algebraic Codes for Data Transmission, 2nd edn. Cambridge University Press, Cambridge (2003)

    Book  MATH  Google Scholar 

  25. Breslauer, K.J., Frank, R.: Predicting DNA duplex stability from the base sequence. Proceedings of the National Academy of Sciences 83(11), 3746–3750 (1986)

    Article  Google Scholar 

  26. Crick, F.: Codon and anticodon pairing: the wobble hypothesis. Journal of Molecular Biology 19, 548–555 (1966)

    Article  Google Scholar 

  27. Lin, S., Costello, D.J.: Error control coding: fundamentals and applications, vol. 114. Pearson-Prentice Hall, Upper Saddle River (2004)

    Google Scholar 

  28. Dubchak, I., Poliakov, A., Kislyuk, A., Brudno, M.: Multiple whole-genome alignments without a reference organism. Genome Res. 19, 682–689 (2009)

    Article  Google Scholar 

  29. Batzoglou, S., Pachter, L., Mesirov, J.P., Berger, B., Lander, E.S.: Human and mouse gene structure: Comparative analysis and application to exon prediction. Genome Res. 10, 950–958 (2000)

    Article  Google Scholar 

  30. Brudno, M., Do, C.B., Cooper, G.M., Kim, M.F., Davydov, E., Green, E.D., Sidow, A., Batzoglou, S.: LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res., 13 (April 2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ning Yu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Yu, N., Guo, X., Gu, F., Pan, Y. (2015). DNA AS X: An Information-Coding-Based Model to Improve the Sensitivity in Comparative Gene Analysis. In: Harrison, R., Li, Y., Măndoiu, I. (eds) Bioinformatics Research and Applications. ISBRA 2015. Lecture Notes in Computer Science(), vol 9096. Springer, Cham. https://doi.org/10.1007/978-3-319-19048-8_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-19048-8_31

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-19047-1

  • Online ISBN: 978-3-319-19048-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics