Abstract
To date, comparing and visualizing genome sequences remain challenging due to the large genome size. Existing approaches take advantage of the stable property of oligonucleotides and exhibit the main characteristics of the whole genome, yet they commonly fail to show progression patterns of the genome adjustably. This paper presents a novel visual encoding technique, which not only supports the binning process (phylogenetic analysis), but also allows the sequential analysis of the genome. The key idea is to regard the combination of each k-nucleotide and its reverse complement as a visual word, and to represent a long genome sequence with a list of local statistical feature vectors derived from the local frequency of the visual words. Experimental results on a variety of examples demonstrate that the presented approach has the ability to quickly and intuitively visualize DNA sequences, and to help the user identify regions of differences among multiple datasets.
Similar content being viewed by others
References
Assa, J., Cohen-Or, D., Yeh, I.C., Lee, T.Y., 2008. Motion overview of human action. ACM Trans. Graph., 27(5):480–489. [doi:10.1145/1409060.1409068]
Blei, D.M., Lafferty, J.D., 2006. Dynamic Topic Models. Proc. 23rd Int. Conf. on Machine Learning, p.113–120. [doi:10.1145/1143844.1143859]
Blei, D.M., Lafferty, J.D., 2007. Modeling Science. Available from http://www.cs.cmu.edu/~lemur/science
Borg, I., Groenen, P., 2003. Modern multidimensional scaling: theory and applications. J. Educat. Meas., 40(3):277–280. [doi:10.1111/j.1745-3984.2003.tb01108.x]
Bourque, G., Pevzner, P.A., 2002. Genome-scale evolution: reconstructing gene orders in the ancestral species. Genome Res., 12(1):26–36.
Deschavanne, P.J., Giron, A., Vilain, J., Fagot, G., Fertil, B., 1999. Genomic signature: characterization and classification of species assessed by chaos game representation of sequences. Mol. Biol. Evol., 16:1391–1399.
Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D., 1998. Cluster analysis and display of genome-wide expression patterns. PNAS, 95(25):14863–14868. [doi:10.1073/pnas.95.25.14863]
Fortuna, B., Grobelnik, M., Mladenic, D., 2005. Visualization of text document corpus. Informatica, 29:497–502.
Goldman, D.B., Curless, B., Seitz, S.M., Salesion, D., 2006. Schematic storyboarding for video visualization and editing. ACM Trans. Graph., 25(3):862–871. [doi:10.1145/1141911.1141967]
Grundy, E., Jones, M.W., Laramee, R.S., Wilson, R.P., Shepard, E.L.C., 2009. Visualisation of sensor data from animal movement. Comput. Graph. Forum, 28(3):815–822. [doi:10.1111/j.1467-8659.2009.01469.x]
Hallin, P., Binnewies, T., Ussery, D., 2008. The genome blastatlas—a genewiz extension for visualization of whole-genome homology. Mol. BioSyst., 4(5):363. [doi:10.1039/b717118h]
Hastie, T., Tibshirani, R., Friedman, J., Franklin, J., 2005. The elements of statistical learning: data mining, inference and prediction. Math. Intell., 27(2):83–85. [doi:10.1007/BF02985802]
Havre, S., Hetzler, E., Perrine, K., Jurrus, E., Miller, N., 2001. Interactive Visualization of Multiple Query Results. Proc. IEEE Information Visualization, p.105–112.
Herniou, E., Luque, T., Chen, X., Vlak, J.M., Winstanley, D., Copy, J.S., O’Reilly, D.R., 2001. Use of whole genome sequence data to infer baculovirus phylogeny. J. Virol., 75(17):8117–8126. [doi:10.1128/JVI.75.17.8117-8126.2001]
Karlin, S., Burge, C., 1995. Dinucleotide relative abundance extremes: a genomic signature. Trends Genet., 11(7):283–290. [doi:10.1016/S0168-9525(00)89076-9]
Karlin, S., Zhu, Z., Karlin, K.D., 1997. The extended environment of mononuclear metal centers in protein structures. PNAS, 94(26):14225–14230. [doi:10.1073/pnas.94.26.14225]
Karlin, S., Brocchieri, L., Mrazek, J., Campbell, A.M., Spormann, A.M., 1999. A chimeric prokaryotic ancestry of mitochondria and primitive eukaryote. PNAS, 96(16):9190–9195. [doi:10.1073/pnas.96.16.9190]
Lu, A., Shen, H., 2008. Interactive Storyboard for Overall Time-Varying Data Visualization. IEEE Pacific Visualization Symp., p.143–150. [doi:10.1109/PACIFICVIS.2008.4475470]
Mao, Y., Dillon, J., Lebanon, G., 2007. Sequential document visualization. IEEE Trans. Visual. Comput. Graph., 13(6):1208–1215. [doi:10.1109/TVCG.2007.70592]
Meyer, M., Munzner, T., Pfister, H., 2009. MizBee: a multiscale synteny browser. IEEE Trans. Visual. Comput. Graph., 15(6):897–904. [doi:10.1109/TVCG.2009.167]
Savva, G., Dicks, J., Roberts, I.N., 2003. Current approaches to whole genome phylogenetic analysis. Brief. Bioinform., 4(1):63–74. [doi:10.1093/bib/4.1.63]
Schbath, S., Prum, B., de Turckheim, E., 1995. Exceptional motifs in different Markov chain models for a statistical analysis of DNA sequences. J. Comput. Biol., 2(3):417–437. [doi:10.1089/cmb.1995.2.417]
Shah, N., Dillard, S.E., Weber, G.H., Hamann, B., 2004. Volume Visualization of Multiple Alignment of Large Genomic DNA. Springer-Verlag, p.325–342.
Trifonov, E.N., Sussman, J.L., 1980. The pitch of chromatin DNA is reflected in its nucleotide sequence. PNAS, 77(7):3816–3820. [doi:10.1073/pnas.77.7.3816]
Zhou, F., Olman, V., Xu, Y., 2008. Barcodes for genomes and applications. BMC Bioinform., 9:546. [doi:10.1186/1471-2105-9-546]
Author information
Authors and Affiliations
Corresponding author
Additional information
The two authors contributed equally to this work
Project supported by the National Natural Science Foundation of China (Nos. 60873123 and 60903085), the National Basic Research Program (973) of China (No. 2010CB732504), the Natural Science Foundation of Zhejiang Province (No. Y1080618), and the Open Project Program of the State Key Lab of CAD & CG, Zhejiang University, China (No. A0905)
Rights and permissions
About this article
Cite this article
Mao, Xh., Fu, Jh., Chen, W. et al. Structural visualization of sequential DNA data. J. Zhejiang Univ. - Sci. C 12, 263–272 (2011). https://doi.org/10.1631/jzus.C1000091
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/jzus.C1000091