Abstract:
Similarity measures based on compression assess the distance between two objects based on the number of bits needed to describe one, given a description of the other. The...Show MoreMetadata
Abstract:
Similarity measures based on compression assess the distance between two objects based on the number of bits needed to describe one, given a description of the other. Theoretically, compression-based similarity depends on the concept of Kol-mogorov complexity, which is non-computable. The implementations require compression algorithms that are approximately normal. The approach has important advantages (no signal features to identify and extract, for example) but the compression method must be normal. This paper proposes normal algorithms based on mixtures of finite context models. Normality is attained by combining two new ideas: the use of least-recently-used caching in the context models, to allow deeper contexts, and data interleaving, to better explore that cache. Examples for DNA sequences are given (at the human genome scale).
Published in: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 04-09 May 2014
Date Added to IEEE Xplore: 14 July 2014
Electronic ISBN:978-1-4799-2893-4