Loading [a11y]/accessibility-menu.js
Compression-based normal similarity measures for DNA sequences | IEEE Conference Publication | IEEE Xplore

Compression-based normal similarity measures for DNA sequences


Abstract:

Similarity measures based on compression assess the distance between two objects based on the number of bits needed to describe one, given a description of the other. The...Show More

Abstract:

Similarity measures based on compression assess the distance between two objects based on the number of bits needed to describe one, given a description of the other. Theoretically, compression-based similarity depends on the concept of Kol-mogorov complexity, which is non-computable. The implementations require compression algorithms that are approximately normal. The approach has important advantages (no signal features to identify and extract, for example) but the compression method must be normal. This paper proposes normal algorithms based on mixtures of finite context models. Normality is attained by combining two new ideas: the use of least-recently-used caching in the context models, to allow deeper contexts, and data interleaving, to better explore that cache. Examples for DNA sequences are given (at the human genome scale).
Date of Conference: 04-09 May 2014
Date Added to IEEE Xplore: 14 July 2014
Electronic ISBN:978-1-4799-2893-4

ISSN Information:

Conference Location: Florence, Italy

References

References is not available for this document.