Abstract
Viruses compress their genome to reduce space. One of the main techniques is overlapping genes. We model this process by the shortest common superstring problem, that is, we look for the shortest genome which still contains all genes. We give an algorithm for computing optimal solutions which is slow in the number of strings but fast (linear) in their total length. This algorithm is used for a number of viruses with relatively few genes. When the number of genes is larger, we compute approximate solutions using the greedy algorithm which gives an upper bound for the optimal solution. We give also a lower bound for the shortest common superstring problem. The results obtained are then compared with what happens in nature. Remarkably, the compression obtained by viruses is quite high and also very close to the one achieved by modern computers.
Chapter PDF
Similar content being viewed by others
Keywords
References
Armen, C., Stein, C.: Improved length bounds for the shortest superstring problem. In: Proc. 5th Internat. Workshop on Algorithms and Data Structures 1995. LNCS, vol. 955, pp. 494–505. Springer, Berlin (1995)
Armen, C., Stein, C.: A \(2\frac{2}{3}\) approximation algorithm for the shortest superstring problem. In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 87–101. Springer, Heidelberg (1996)
Blum, A., Jiang, T., Li, M., Tromp, J., Yannakakis, M.: Linear approximation of shortest superstrings. J. Assoc. Comput. Mach. 41(4), 630–647 (1994)
Breslauer, D., Jiang, T., Jiang, Z.: Rotations of periodic strings and short superstrings. J. Algorithms 24, 340–353 (1997)
Cann, A.J.: Principles of Molecular Virology, 3rd edn. Elsevier Academic Press, London, San Diego (2001)
Chen, X., Li, M., Ma, B., Tromp, J.: DNACompress: fast and effective DNA sequence compression. Bioinformatics 18, 1696–1698 (2002)
Crochemore, M., Rytter, W.: Jewels of Stringology. World Scientific Publisher, Singapore (2003)
Czumaj, A., Gasieniec, L., Piotrow, M., Rytter, W.: Parallel and sequential approximations of shortest superstrings. In: Proc. First Scandinavian Workshop on Algorithm Theory. LNCS, vol. 824, pp. 95–106. Springer, Berlin (1994)
Daley, M., McQuillan, I.: Viral Gene Compression: Complexity and Verification. In: Domaratzki, M., Okhotin, A., Salomaa, K., Yu, S. (eds.) CIAA 2004. LNCS, vol. 3317, pp. 102–112. Springer, Heidelberg (2005)
Gallant, J., Maier, D., Storer, J.: On finding minimal length superstrings. Journal of Comput. and Syst. Sci. 20(1), 50–58 (1980)
Kosaraju, R., Park, J., Stein, C.: Long tours and short superstrings. In: Proc. 35th Annual IEEE Symposium on Foundations of Computer Science, pp. 166–177. IEEE Computer Society Press, Los Alamitos (1994)
Krakauer, D.C.: Evolutionary principles of genomic compression. Comments on Theor. Biol. 7, 215–236 (2002)
Lesk, A.: Introduction to Bioinformatics. Oxford University Press, Oxford (2002)
Lothaire, M.: Algebraic Combinatorics on Words. Cambridge University Press, Cambridge (2002)
Storer, J.: Data Compression: Methods and Theory. Computer Science Press (1988)
Sweedyk, Z.: A \(2\frac{1}{2}\)-approximation algorithms for shortest superstring. SIAM J. Comput. 29(3), 954–986 (1999)
Teng, S., Yao, F.: Approximating shortest superstrings. In: Proc. 34th Annual IEEE Symposium on Foundations of Computer Science, pp. 158–165. IEEE Computer Society Press, Los Alamitos (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ilie, L., Tinta, L., Popescu, C., Hill, K.A. (2006). Viral Genome Compression. In: Mao, C., Yokomori, T. (eds) DNA Computing. DNA 2006. Lecture Notes in Computer Science, vol 4287. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11925903_9
Download citation
DOI: https://doi.org/10.1007/11925903_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49024-1
Online ISBN: 978-3-540-68423-7
eBook Packages: Computer ScienceComputer Science (R0)