Loading [a11y]/accessibility-menu.js
A Novel Compression Algorithm for High-Throughput DNA Sequence Based on Huffman Coding Method | IEEE Conference Publication | IEEE Xplore

A Novel Compression Algorithm for High-Throughput DNA Sequence Based on Huffman Coding Method


Abstract:

NGS (Next generation sequencing) technology can concurrently accomplish sequencing of a large scale of DNA data in one time, resulting in a large number of DNA short read...Show More

Abstract:

NGS (Next generation sequencing) technology can concurrently accomplish sequencing of a large scale of DNA data in one time, resulting in a large number of DNA short reads. The transportation and processing of DNA data are thus faced with difficulties. There are two kinds of compression methods for high-throughput DNA data, reference-based method and reference-free method. Reference-free method is adaptive for compressing DNA data from different species without storing large genome for reference. In this paper, we proposed a reference-free algorithm, named HDC, realizing high-throughput DNA compression based on Huffman coding and dictionary method. The algorithm builds multiple dictionaries through Huffman coding and uses the dictionary to finish the compression and decompression. By testing on the genomes of human, green monkey and horse, HDC's lowest compression rate reaches 0.192 when compressing the human genome with chromosome as compression unit. We also compared HDC with a conventional compression algorithm gzip, and two reference-free DNA compression algorithms Leon and ORCOM. The result demonstrates that the HDC algorithm performs significantly best among the three algorithms.
Date of Conference: 13-15 October 2018
Date Added to IEEE Xplore: 03 February 2019
ISBN Information:
Conference Location: Beijing, China

Contact IEEE to Subscribe

References

References is not available for this document.