Abstract:
Large amounts of genome data are publicly available due to the high-throughput sequencing technologies developed in recent years. This availability raises a major concern...Show MoreMetadata
Abstract:
Large amounts of genome data are publicly available due to the high-throughput sequencing technologies developed in recent years. This availability raises a major concern about data storage costs, given that an effective and efficient compression algorithm for genome data remains an unresolved challenge in genomic data studies. In this paper, we propose a compression method, DeepDNA, that is a hybrid convolutional and recurrent deep neural network for compressing human genome data. In the DeepDNA model, the convolutional layer captures the genome's local features, while the recurrent layer captures long-term dependencies for estimating the next base probabilities in the genomic sequence. The experimental results on human mitochondrial genome datasets show the effectiveness of the DeepDNA method.The code for DeepDNA is available at https://github.com/rongiiewang/deepDNA.
Date of Conference: 03-06 December 2018
Date Added to IEEE Xplore: 24 January 2019
ISBN Information: