Testing Component Independence Using Data Compressors

Ryabko, Daniil

doi:10.1007/978-3-540-74695-9_83

Daniil Ryabko¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4669))

Included in the following conference series:

International Conference on Artificial Neural Networks

1887 Accesses

Abstract

We propose a new nonparametric test for component independence which is based on application of data compressors to ranked data. For two-component data sample the idea is to break the sample in two parts and permute one of the components in the second part, while leaving the first part intact. The resulting two samples are then jointly ranked and a data compressor is applied to the resulting (binary) data string. The components are deemed independent if the string cannot be compressed. This procedure gives a provably valid test against all possible alternatives (that is, the test is distribution-free) provided the data compressor was ideal.

This research was supported by the Swiss NSF grants 200020-107616 and 200021-113364.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cilibrasi, R., Vitányi, P.: Clustering by Compression. IEEE Transactions on Information Theory 51(4) (2005)
Google Scholar
Cilibrasi, R., de Wolf, R., Vitányi, P.: Algorithmic Clustering of Music. Computer Music Journal 28(4), 49–67 (2004)
Article Google Scholar
Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Article MATH Google Scholar
Lehmann, E.: Testing Statistical Hypotheses, 2nd edn. John Wiley & Sons, New York (1986)
MATH Google Scholar
Li, M., Chen, X., Li, X., Ma, B., Vitányi, P.: The similarity metric. IEEE Trans. Inform. Th. 50(12), 3250–3264 (2004)
Article Google Scholar
Li, M., Vitányi, P.: An introduction to Kolmogorov complexity and its applications, 2nd edn. Springer, Heidelberg (1997)
MATH Google Scholar
Liu, H., Motoda., Hiroshi.: Feature Selection for Knowledge Discovery and Data Mining. Springer, Heidelberg (1998)
MATH Google Scholar
Ryabko, B.: Prediction of random sequences and universal coding. Problems of Inform. Transmission 24(2), 87–96 (1988)
MATH Google Scholar
Ryabko, B., Astola, J.: Universal Codes as a Basis for Time Series Testing. Statistical Methodology 3, 375–397 (2006)
Article Google Scholar
Ryabko, B., Monarev, V.: Using information theory approach to randomness testing. Journal of Statistical Planning and Inference 133(1), 95–110 (2005)
Article MATH Google Scholar
Vereshchagin, N., Shen, A., Uspensky, V.: Lecture Notes on Kolmogorov Complexity, Unpublished (2004), http://lpcs.math.msu.su/~ver/kolm-book
Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inform. Theory IT-24(5), 530–536 (1978)
Article Google Scholar

Download references

Author information

Authors and Affiliations

IDSIA, Galleria 2, CH-6928 Manno, Switzerland
Daniil Ryabko

Authors

Daniil Ryabko
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Joaquim Marques de Sá Luís A. Alexandre Włodzisław Duch Danilo Mandic

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ryabko, D. (2007). Testing Component Independence Using Data Compressors. In: de Sá, J.M., Alexandre, L.A., Duch, W., Mandic, D. (eds) Artificial Neural Networks – ICANN 2007. ICANN 2007. Lecture Notes in Computer Science, vol 4669. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74695-9_83

Download citation

DOI: https://doi.org/10.1007/978-3-540-74695-9_83
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74693-5
Online ISBN: 978-3-540-74695-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics