Skip to main content
Log in

Application of data compression methods to nonparametric estimation of characteristics of discrete-time stochastic processes

  • Source Coding
  • Published:
Problems of Information Transmission Aims and scope Submit manuscript

Abstract

Discrete-time stochastic processes generating elements of either a finite set (alphabet) or a real line interval are considered. Problems of estimating limiting (or stationary) probabilities and densities are considered, as well as classification and prediction problems. We show that universal coding (or data compression) methods can be used to solve these problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Kullback, S., Information Theory and Statistics, New York: Wiley, 1959. Translated under the title Teoriya informatsii i statistika, Moscow: Nauka, 1967.

    MATH  Google Scholar 

  2. Ryabko, B.Ya. and Monarev, V.A., Using Information Theory Approach to Randomness Testing, J. Stat. Plan. Inference, 2005, vol. 133, no. 1, pp. 95–110.

    Article  MathSciNet  MATH  Google Scholar 

  3. Ryabko, B. and Astola, J., Universal Codes as a Basis for Time Series Testing, Stat. Methodol., 2006, vol. 3, no. 4, pp. 375–397.

    Article  MathSciNet  Google Scholar 

  4. Ryabko, B.Ya. and Monarev, V.A., Experimental Investigation of Forecasting Methods Based on Data Compression Algorithms, Probl. Peredachi Inf., 2005, vol. 41, no. 1, pp. 74–78 [Probl. Inf. Trans. (Engl. Transl.), 2005, vol. 41, no. 1, pp. 65–69].

    Google Scholar 

  5. Ryabko, B.Ya., Prediction of Random Sequences and Universal Coding, Probl. Peredachi Inf., 1988, vol. 24, no. 2, pp. 3–14 [Probl. Inf. Trans. (Engl. Transl.), 1988, vol. 24, no. 2, pp. 87–96].

    MathSciNet  Google Scholar 

  6. Algoet, P., Universal Schemes for Learning the Best Nonlinear Predictor Given the Infinite Past and Side Information, IEEE Trans. Inform. Theory, 1999, vol. 45, no. 4, pp. 1165–1185.

    Article  MathSciNet  MATH  Google Scholar 

  7. Györfi, L., Morvai, G., and Yakowitz, S.J., Limits to Consistent On-line Forecasting for Ergodic Time Series, IEEE Trans. Inform. Theory, 1998, vol. 44, no. 2, pp. 886–892.

    Article  MathSciNet  MATH  Google Scholar 

  8. Jacquet, P., Szpankowski, W., and Apostol, I., A Universal Predictor Based on Pattern Matching, IEEE Trans. Inform. Theory, 2002, vol. 48, no. 6, pp. 1462–1472.

    Article  MathSciNet  MATH  Google Scholar 

  9. Kieffer, J., Prediction and Information Theory, Preprint of Univ. of Minnesota, 1998.

  10. Modha, D.S. and Masry, E., Memory-Universal Prediction of Stationary Random Processes, IEEE Trans. Inform. Theory, 1998, vol. 44, no. 1, pp. 117–133.

    Article  MathSciNet  MATH  Google Scholar 

  11. Morvai, G., Yakowitz, S.J., and Algoet, P., Weakly Convergent Nonparametric Forecasting of Stationary Time Series, IEEE Trans. Inform. Theory, 1997, vol. 43, no. 2, pp. 483–498.

    Article  MathSciNet  MATH  Google Scholar 

  12. Kolmogorov, A.N., Three Approaches to the Quantitative Definition of Information, Probl. Peredachi Inf., 1965, vol. 1, no. 1, pp. 3–11 [Probl. Inf. Trans. (Engl. Transl.), 1965, vol. 1, no. 1, pp. 1–7].

    MathSciNet  MATH  Google Scholar 

  13. Uspensky, V.A., Semenov, A.L., and Shen’, A.Kh., Can an Individual Sequence of Zeros and Ones Be Random?, Uspekhi Mat. Nauk, 1990, vol. 45, no. 1, pp. 105–162 [Russian Math. Surveys (Engl. Transl.), 1990, vol. 45, no. 1, pp. 121–189].

    MathSciNet  Google Scholar 

  14. Ryabko, B., Astola, J., and Gammerman, A., Application of Kolmogorov Complexity and Universal Codes to Identity Testing and Nonparametric Testing of Serial Independence for Time Series, Theoret. Comput. Sci., 2006, vol. 359, no. 1–3, pp. 440–448.

    Article  MathSciNet  MATH  Google Scholar 

  15. Rukhin, A. et al., A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications, NIST Special Publication (SP 800-22), 2001. Available at http://csrc.nist.gov/rng/SP800-22b.pdf.

  16. Kukushkina, O.V., Polikarpov, A.A., and Khmelev, D.V., Using Literal and Grammatical Statistics for Authorship Attribution, Probl. Peredachi Inf., 2001, vol. 37, no. 2, pp. 96–109 [Probl. Inf. Trans. (Engl. Transl.), 2001, vol. 37, no. 2, pp. 172–184].

    MathSciNet  Google Scholar 

  17. Cilibrasi, R. and Vitányi, P.M.B., Clustering by Compression, IEEE Trans. Inform. Theory, 2005, vol. 51, no. 4, pp. 1523–1545.

    Article  MathSciNet  Google Scholar 

  18. Cilibrasi, R., Vitányi, P., and de Wolf, R., Algorithmic Clustering of Music Based on String Compression, Computer Music J., 2004, vol. 28, no. 4, pp. 49–67.

    Article  Google Scholar 

  19. Feller, W., An Introduction to Probability Theory and Its Applications, New York: Wiley, 1970, vol. 1, 3rd ed. Translated under the title Vvedenie v teoriyu veroyatnostej i ee prilozheniya, Moscow: Mir, 1984, vol 1.

    Google Scholar 

  20. Csiszár, I. and Körner, J., Information Theory: Coding Theorems for Discrete Memoryless Systems, New York: Academic; Budapest: Akad. Kiadó, 1981. Translated under the title Teoriya informatsii: teoremy kodirovaniya dlya diskretnykh sistem bez pamyati, Moscow: Mir, 1985.

    MATH  Google Scholar 

  21. Ryabko, B.Ya., Fast Adaptive Coding Algorithm, Probl. Peredachi Inf., 1990, vol. 26, no. 4, pp. 24–37 [Probl. Inf. Trans. (Engl. Transl.), 1990, vol. 24, no. 4, pp. 305–317].

    MathSciNet  Google Scholar 

  22. Krichevskii, R.E., The Relation Between Redundancy Coding and the Reliability of Information from a Source, Probl. Peredachi Inf., 1968, vol. 4, no. 3, pp. 48–57 [Probl. Inf. Trans. (Engl. Transl.), 1968, vol. 4, no. 3, pp. 37–45].

    MathSciNet  Google Scholar 

  23. Krichevskii, R.E., Szhatie i poisk informatsii, Moscow: Radio i Svyaz’, 1989. Translated under the title Universal Compression and Retrieval, Dordrecht: Kluwer, 1994.

    Google Scholar 

  24. Gallager, R.G., Information Theory and Reliable Communication, New York: Wiley, 1968. Translated under the title Teoriya informatsii i nadezhnaya svyaz’, Moscow: Sov. Radio, 1974.

    MATH  Google Scholar 

  25. Ryabko, D. and Hutter, M., Sequence Prediction for Non-stationary Processes, Combinatorial and Algorithmic Foundations of Pattern and Association Discovery, Ahlswede, R., Apostolico, A., and Levenshtein, V.I., Eds., Dagstuhl Seminar Proceedings, no. 06201, Dagstuhl: IBFI, 2006.

    Google Scholar 

  26. Barron, A.R., The Strong Ergodic Theorem for Densities: Generalized Shannon-McMillan-Breiman Theorem, Ann. Probab., 1985, vol. 13, no. 4, pp. 1292–1303.

    Article  MathSciNet  MATH  Google Scholar 

  27. Darbellay, G.A. and Vajda, I., Entropy Expressions for Multivariate Continuous Distributions, IEEE Trans. Inform. Theory, 2000, vol. 46, no. 2, pp. 709–712.

    Article  MathSciNet  MATH  Google Scholar 

  28. Darbellay, G.A. and Vajda, I., Estimation of the Mutual Information with Data-Dependent Partitions, Research Rep. of the Inst. of Information Theory and Automation (ÚTIA), Prague, 1998, no. 1921.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to B. Ya. Ryabko.

Additional information

Original Russian Text © B.Ya. Ryabko, 2007, published in Problemy Peredachi Informatsii, 2007, Vol. 43, No. 4, pp. 109–123.

Supported in part by the Russian Foundation for Basic Research, project no. 06-07-89025.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ryabko, B.Y. Application of data compression methods to nonparametric estimation of characteristics of discrete-time stochastic processes. Probl Inf Transm 43, 367–379 (2007). https://doi.org/10.1134/S0032946007040096

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0032946007040096

Keywords