Abstract
We propose a new nonparametric test for component independence which is based on application of data compressors to ranked data. For two-component data sample the idea is to break the sample in two parts and permute one of the components in the second part, while leaving the first part intact. The resulting two samples are then jointly ranked and a data compressor is applied to the resulting (binary) data string. The components are deemed independent if the string cannot be compressed. This procedure gives a provably valid test against all possible alternatives (that is, the test is distribution-free) provided the data compressor was ideal.
This research was supported by the Swiss NSF grants 200020-107616 and 200021-113364.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cilibrasi, R., Vitányi, P.: Clustering by Compression. IEEE Transactions on Information Theory 51(4) (2005)
Cilibrasi, R., de Wolf, R., Vitányi, P.: Algorithmic Clustering of Music. Computer Music Journal 28(4), 49–67 (2004)
Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Lehmann, E.: Testing Statistical Hypotheses, 2nd edn. John Wiley & Sons, New York (1986)
Li, M., Chen, X., Li, X., Ma, B., Vitányi, P.: The similarity metric. IEEE Trans. Inform. Th. 50(12), 3250–3264 (2004)
Li, M., Vitányi, P.: An introduction to Kolmogorov complexity and its applications, 2nd edn. Springer, Heidelberg (1997)
Liu, H., Motoda., Hiroshi.: Feature Selection for Knowledge Discovery and Data Mining. Springer, Heidelberg (1998)
Ryabko, B.: Prediction of random sequences and universal coding. Problems of Inform. Transmission 24(2), 87–96 (1988)
Ryabko, B., Astola, J.: Universal Codes as a Basis for Time Series Testing. Statistical Methodology 3, 375–397 (2006)
Ryabko, B., Monarev, V.: Using information theory approach to randomness testing. Journal of Statistical Planning and Inference 133(1), 95–110 (2005)
Vereshchagin, N., Shen, A., Uspensky, V.: Lecture Notes on Kolmogorov Complexity, Unpublished (2004), http://lpcs.math.msu.su/~ver/kolm-book
Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inform. Theory IT-24(5), 530–536 (1978)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ryabko, D. (2007). Testing Component Independence Using Data Compressors. In: de Sá, J.M., Alexandre, L.A., Duch, W., Mandic, D. (eds) Artificial Neural Networks – ICANN 2007. ICANN 2007. Lecture Notes in Computer Science, vol 4669. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74695-9_83
Download citation
DOI: https://doi.org/10.1007/978-3-540-74695-9_83
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74693-5
Online ISBN: 978-3-540-74695-9
eBook Packages: Computer ScienceComputer Science (R0)