Abstract:
Recent advances in DNA sequencing technology have caused an exponential growth of publicly available genomic sequence data. A particularly voluminous, frequently used sta...Show MoreMetadata
Abstract:
Recent advances in DNA sequencing technology have caused an exponential growth of publicly available genomic sequence data. A particularly voluminous, frequently used static data set are whole genome alignments. The first lossless compression algorithm for such data sets based on well-established statistical evolutionary models and prediction techniques from lossless binary image compression is introduced. The compression rate is improved by a factor of 1.6 compared to the currently used Lempel-Ziv (LZ) compression.
Published in: IEEE Transactions on Information Theory ( Volume: 56, Issue: 2, February 2010)