Skip to main content
Log in

Structural visualization of sequential DNA data

  • Published:
Journal of Zhejiang University SCIENCE C Aims and scope Submit manuscript

Abstract

To date, comparing and visualizing genome sequences remain challenging due to the large genome size. Existing approaches take advantage of the stable property of oligonucleotides and exhibit the main characteristics of the whole genome, yet they commonly fail to show progression patterns of the genome adjustably. This paper presents a novel visual encoding technique, which not only supports the binning process (phylogenetic analysis), but also allows the sequential analysis of the genome. The key idea is to regard the combination of each k-nucleotide and its reverse complement as a visual word, and to represent a long genome sequence with a list of local statistical feature vectors derived from the local frequency of the visual words. Experimental results on a variety of examples demonstrate that the presented approach has the ability to quickly and intuitively visualize DNA sequences, and to help the user identify regions of differences among multiple datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Assa, J., Cohen-Or, D., Yeh, I.C., Lee, T.Y., 2008. Motion overview of human action. ACM Trans. Graph., 27(5):480–489. [doi:10.1145/1409060.1409068]

    Article  Google Scholar 

  • Blei, D.M., Lafferty, J.D., 2006. Dynamic Topic Models. Proc. 23rd Int. Conf. on Machine Learning, p.113–120. [doi:10.1145/1143844.1143859]

  • Blei, D.M., Lafferty, J.D., 2007. Modeling Science. Available from http://www.cs.cmu.edu/~lemur/science

  • Borg, I., Groenen, P., 2003. Modern multidimensional scaling: theory and applications. J. Educat. Meas., 40(3):277–280. [doi:10.1111/j.1745-3984.2003.tb01108.x]

    Article  Google Scholar 

  • Bourque, G., Pevzner, P.A., 2002. Genome-scale evolution: reconstructing gene orders in the ancestral species. Genome Res., 12(1):26–36.

    Google Scholar 

  • Deschavanne, P.J., Giron, A., Vilain, J., Fagot, G., Fertil, B., 1999. Genomic signature: characterization and classification of species assessed by chaos game representation of sequences. Mol. Biol. Evol., 16:1391–1399.

    Google Scholar 

  • Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D., 1998. Cluster analysis and display of genome-wide expression patterns. PNAS, 95(25):14863–14868. [doi:10.1073/pnas.95.25.14863]

    Article  Google Scholar 

  • Fortuna, B., Grobelnik, M., Mladenic, D., 2005. Visualization of text document corpus. Informatica, 29:497–502.

    Google Scholar 

  • Goldman, D.B., Curless, B., Seitz, S.M., Salesion, D., 2006. Schematic storyboarding for video visualization and editing. ACM Trans. Graph., 25(3):862–871. [doi:10.1145/1141911.1141967]

    Article  Google Scholar 

  • Grundy, E., Jones, M.W., Laramee, R.S., Wilson, R.P., Shepard, E.L.C., 2009. Visualisation of sensor data from animal movement. Comput. Graph. Forum, 28(3):815–822. [doi:10.1111/j.1467-8659.2009.01469.x]

    Article  Google Scholar 

  • Hallin, P., Binnewies, T., Ussery, D., 2008. The genome blastatlas—a genewiz extension for visualization of whole-genome homology. Mol. BioSyst., 4(5):363. [doi:10.1039/b717118h]

    Article  Google Scholar 

  • Hastie, T., Tibshirani, R., Friedman, J., Franklin, J., 2005. The elements of statistical learning: data mining, inference and prediction. Math. Intell., 27(2):83–85. [doi:10.1007/BF02985802]

    Article  Google Scholar 

  • Havre, S., Hetzler, E., Perrine, K., Jurrus, E., Miller, N., 2001. Interactive Visualization of Multiple Query Results. Proc. IEEE Information Visualization, p.105–112.

  • Herniou, E., Luque, T., Chen, X., Vlak, J.M., Winstanley, D., Copy, J.S., O’Reilly, D.R., 2001. Use of whole genome sequence data to infer baculovirus phylogeny. J. Virol., 75(17):8117–8126. [doi:10.1128/JVI.75.17.8117-8126.2001]

    Article  Google Scholar 

  • Karlin, S., Burge, C., 1995. Dinucleotide relative abundance extremes: a genomic signature. Trends Genet., 11(7):283–290. [doi:10.1016/S0168-9525(00)89076-9]

    Article  Google Scholar 

  • Karlin, S., Zhu, Z., Karlin, K.D., 1997. The extended environment of mononuclear metal centers in protein structures. PNAS, 94(26):14225–14230. [doi:10.1073/pnas.94.26.14225]

    Article  Google Scholar 

  • Karlin, S., Brocchieri, L., Mrazek, J., Campbell, A.M., Spormann, A.M., 1999. A chimeric prokaryotic ancestry of mitochondria and primitive eukaryote. PNAS, 96(16):9190–9195. [doi:10.1073/pnas.96.16.9190]

    Article  Google Scholar 

  • Lu, A., Shen, H., 2008. Interactive Storyboard for Overall Time-Varying Data Visualization. IEEE Pacific Visualization Symp., p.143–150. [doi:10.1109/PACIFICVIS.2008.4475470]

  • Mao, Y., Dillon, J., Lebanon, G., 2007. Sequential document visualization. IEEE Trans. Visual. Comput. Graph., 13(6):1208–1215. [doi:10.1109/TVCG.2007.70592]

    Article  Google Scholar 

  • Meyer, M., Munzner, T., Pfister, H., 2009. MizBee: a multiscale synteny browser. IEEE Trans. Visual. Comput. Graph., 15(6):897–904. [doi:10.1109/TVCG.2009.167]

    Article  Google Scholar 

  • Savva, G., Dicks, J., Roberts, I.N., 2003. Current approaches to whole genome phylogenetic analysis. Brief. Bioinform., 4(1):63–74. [doi:10.1093/bib/4.1.63]

    Article  Google Scholar 

  • Schbath, S., Prum, B., de Turckheim, E., 1995. Exceptional motifs in different Markov chain models for a statistical analysis of DNA sequences. J. Comput. Biol., 2(3):417–437. [doi:10.1089/cmb.1995.2.417]

    Article  Google Scholar 

  • Shah, N., Dillard, S.E., Weber, G.H., Hamann, B., 2004. Volume Visualization of Multiple Alignment of Large Genomic DNA. Springer-Verlag, p.325–342.

  • Trifonov, E.N., Sussman, J.L., 1980. The pitch of chromatin DNA is reflected in its nucleotide sequence. PNAS, 77(7):3816–3820. [doi:10.1073/pnas.77.7.3816]

    Article  Google Scholar 

  • Zhou, F., Olman, V., Xu, Y., 2008. Barcodes for genomes and applications. BMC Bioinform., 9:546. [doi:10.1186/1471-2105-9-546]

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Chen.

Additional information

The two authors contributed equally to this work

Project supported by the National Natural Science Foundation of China (Nos. 60873123 and 60903085), the National Basic Research Program (973) of China (No. 2010CB732504), the Natural Science Foundation of Zhejiang Province (No. Y1080618), and the Open Project Program of the State Key Lab of CAD & CG, Zhejiang University, China (No. A0905)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mao, Xh., Fu, Jh., Chen, W. et al. Structural visualization of sequential DNA data. J. Zhejiang Univ. - Sci. C 12, 263–272 (2011). https://doi.org/10.1631/jzus.C1000091

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/jzus.C1000091

Key words

CLC number

Navigation