Skip to main content

Exploring Kolmogorov Complexity Approximations for Data Analysis: Insights and Applications

  • Conference paper
  • First Online:
Technological Innovation for Connected Cyber Physical Spaces (DoCEIS 2023)

Abstract

This paper presents a PhD research project focused on investigating the use of Kolmogorov complexity approximations as descriptors for various data types, with the aim of addressing inversion problems. The research explores the application of these approximations across different domains while considering the relationship between algorithmic and probabilistic complexities. The study starts with genomic data analysis, where specialized data compressors are employed to improve taxonomic identification, classification, and organization. The research then extends to analysing artistic paintings, utilizing information-based measures to attribute authorship, categorize styles, and describe the content. Additionally, the research examines Turing Machine-generated data, providing insights into the relationship between algorithmic and probabilistic complexities. A method for increasing probabilistic complexity without affecting algorithmic complexity is also proposed. Lastly, a methodology for identifying programs capable of generating outputs approximating given input strings is introduced, offering potential solutions to inversion problems. The paper highlights this research's diverse applications and findings, contributing to understanding the relationship between algorithmic and probabilistic complexities in data analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hutter, M.: Algorithmic information theory. Scholarpedia 2, 2519 (2007)

    Article  Google Scholar 

  2. Nalbantoglu, Ö., Russell, D., Sayood, K.: Data compression concepts and algorithms and their applications to bioinformatics. Entropy 12(1), 34–52 (2009). https://doi.org/10.3390/e12010034

    Article  Google Scholar 

  3. Silva, M., Pratas, D., Pinho, A.J.: Efficient DNA sequence compression with neural networks. GigaScience 9(11), giaa119 (2020). https://doi.org/10.1093/gigascience/giaa119

    Article  Google Scholar 

  4. MacKay, D.J.C., Mac Kay, D.J.C.: Information theory, inference and learning algorithms. Cambridge University Press, Cambridge (2003)

    Google Scholar 

  5. Cohen, A.R., Bjornsson, C.S., Temple, S., Banker, G., Roysam, B.: Automatic summarization of changes in biological image sequences using algorithmic information theory. IEEE Trans. Pattern Anal. Mach. Intell. 31(8), 1386–1403 (2009). https://doi.org/10.1109/TPAMI.2008.162

    Article  Google Scholar 

  6. Maurer, U.: Information-theoretic cryptography. In: Wiener, M. (ed.) Advances in Cryptology — CRYPTO’ 99. LNCS, vol. 1666, pp. 47–65. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48405-1_4

    Chapter  Google Scholar 

  7. Yeboah-Ofori, A., Agbodza, C.K., Opoku-Boateng, F.A., Darvishi, I., Sbai, F.: Applied cryptography in network systems security for cyberattack prevention. In: 2021 International Conference on Cyber Security and Internet of Things (ICSIoT), pp. 43–48 (2021)

    Google Scholar 

  8. Tenreiro Machado, J., Lopes, A.M.: Artistic painting: a fractional calculus perspective. Appl. Math. Model. 65, 614–626 (2019)

    Google Scholar 

  9. Li, M., Vitanyi, P.: An Introduction to Kolmogorov Complexity and its Applications. TCS, Springer, New York (2008). https://doi.org/10.1007/978-0-387-49820-1

    Book  MATH  Google Scholar 

  10. Voss, R.F.: Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. Phys. Rev. Lett. 68(25), 3805–3808 (1992). https://doi.org/10.1103/PhysRevLett.68.3805

    Article  Google Scholar 

  11. Ciliberti, S., Martin, O.C., Wagner, A.: Innovation and robustness in complex regulatory gene networks. Proc. Natl. Acad. Sci. 104(34), 13591–13596 (2007). https://doi.org/10.1073/pnas.0705396104

    Article  Google Scholar 

  12. Adami, C., Ofria, C., Collier, T.C.: Evolution of biological complexity. Proc. Natl. Acad. Sci. 97(9), 4463–4468 (2000). https://doi.org/10.1073/pnas.97.9.4463

    Article  Google Scholar 

  13. Rissanen, J.: Modeling by shortest data description. Automatica 14(5), 465–471 (1978)

    Article  MATH  Google Scholar 

  14. RossQuinlan, J., Rivest, R.L.: Inferring decision trees using the minimum description length principle. Inf. Comput. 80(3), 227–248 (1989). https://doi.org/10.1016/0890-5401(89)90010-2

    Article  MathSciNet  MATH  Google Scholar 

  15. Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-1(2), 224–227 (1979). https://doi.org/10.1109/TPAMI.1979.4766909

    Article  Google Scholar 

  16. Dodis, Y., Reyzin, L., Smith, A.: Fuzzy extractors: how to generate strong keys from biometrics and other noisy data. In: Cachin, C., Camenisch, J.L. (eds.) Advances in Cryptology - EUROCRYPT 2004. LNCS, vol. 3027, pp. 523–540. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24676-3_31

    Chapter  Google Scholar 

  17. Chaitin, G.J.: Algorithmic information theory. IBM J. Res. Dev. 21(4), 350–359 (1977)

    Google Scholar 

  18. Bruce, S.: Applied cryptography: Protocols, Algorthms, and Source Code in c.-2nd (1996)

    Google Scholar 

  19. Kolmogorov, A.N.: Three approaches to the quantitative definition of information. Probl. Inf. Trans. 1(1), 1–7 (1965)

    Google Scholar 

  20. Chaitin, G.J.: On the length of programs for computing finite binary sequences: statistical considerations. J. ACM (JACM). 16(1), 145–159 (1969)

    Google Scholar 

  21. Calude, C.S.: Information and Randomness: An Algorithmic Perspective. Springer Science & Business Media, Heidelberg (2002). https://doi.org/10.1007/978-3-662-03049-3

  22. Sayood, K.: Introduction. In: Introduction to data compression, pp. 1–10. Elsevier (2018). https://doi.org/10.1016/B978-0-12-809474-7.00001-X

    Chapter  MATH  Google Scholar 

  23. Moffat, A.: Word-based text compression. Softw. Pract. Exp. 19(2), 185–198 (1989). https://doi.org/10.1002/spe.4380190207

    Article  Google Scholar 

  24. Knoll, B., de Freitas, N.: A machine learning perspective on predictive coding with PAQ8. In: 2012 Data Compression Conference, pp. 377–386. IEEE (2012)

    Google Scholar 

  25. Carrasco, R.C., Oncina, J.: Learning stochastic regular grammars by means of a state merging method. In: Carrasco, R.C., Oncina, J. (eds.) Grammatical Inference and Applications. LNCS, vol. 862, pp. 139–152. Springer, Heidelberg (1994). https://doi.org/10.1007/3-540-58473-0_144

    Chapter  MATH  Google Scholar 

  26. Silva, J.M., Pratas, D., Caetano, T., Matos, S.: The complexity landscape of viral genomes. GigaScience 11, 1–16 (2022). https://doi.org/10.1093/gigascience/giac079

    Article  Google Scholar 

  27. Silva, J.M., Pratas, D., Antunes, R., Matos, S., Pinho, A.J.: Automatic analysis of artistic paintings using information-based measures. Pattern Recogn. 114, 107864 (2021). https://doi.org/10.1016/j.patcog.2021.107864

    Article  Google Scholar 

  28. Wallace, C.S.: Minimum message length and kolmogorov complexity. Comput. J. 42(4), 270–283 (1999). https://doi.org/10.1093/comjnl/42.4.270

    Article  MATH  Google Scholar 

  29. Hutter, M.: Universal algorithmic intelligence: a mathematical top→down approach. In: Goertzel, B., Pennachin, C. (eds.) Artificial general intelligence, pp. 227–290. Springer Berlin Heidelberg, Heidelberg (2007). https://doi.org/10.1007/978-3-540-68677-4_8

    Chapter  Google Scholar 

  30. Silva, J.M., Almeida, J.R.: The value of compression for taxonomic identification. In: 2022 IEEE 35th International Symposium on Computer-Based Medical Systems (CBMS), pp. 276–281. IEEE (2022)

    Google Scholar 

  31. Zenil, H., Delahaye, J.-P.: An algorithmic information theoretic approach to the behaviour of financial markets. J. Econ. Surv. 25(3), 431–463 (2011)

    Article  Google Scholar 

  32. Silva, J.M., Pratas, D., Caetano, T., Matos, S.: Feature-based classification of archaeal sequences using compression-based methods. In: Pinho, A.J., Georgieva, P., Teixeira, L.F., Sánchez, J.A. (eds.) Pattern Recognition and Image Analysis, pp. 309–320. Springer International Publishing, Cham (2022). https://doi.org/10.1007/978-3-031-04881-4_25

    Chapter  Google Scholar 

  33. jorgeMFS. Complexity ANalysis VirAl Sequences (C.A.N.V.A.S.) Repository (2021). https://github.com/jorgeMFS/canvas

  34. jorgeMFS. Classification and identification of Archaea (ARCHAEA2) Repository (2021). https://github.com/jorgeMFS/Archaea2

  35. bioinformatics ua. COMPressor tAxonomic ClassificaTion (C.O.M.P.A.C.T.) Repository (2021). https://github.com/bioinformatics-ua/COMPACT

  36. CANVAS Website. CANVAS Website (2021). https://asilab.github.io/canvas/

  37. asilab. Measuring probabilistic-algorithmic information of artistic paintings (PANTHER) Repository (2021). https://github.com/asilab/panther

  38. PANTHER Website. PANTHER Website (2021). http://panther.web.ua.pt/

  39. Silva, J.M., Pinho, E., Matos, S., Pratas, D.: Statistical complexity analysis of turing machine tapes with fixed algorithmic complexity using the best-order Markov model. Entropy. 22(1), 105 (2020)

    Google Scholar 

  40. asilab. TMCompression Repository (2021). https://github.com/asilab/TMCompression

  41. jorgeMFS.Turing Machine Recreator (TMRecreator) (2021). https://github.com/jorgeMFS/TMRecreator

  42. jorgeMFS. SPTTM (2021). https://github.com/jorgeMFS/spttm

  43. bioinformatics ua. TM Neural Finder (2021). https://github.com/bioinformatics-ua/TM-Neural-Finder

Download references

Acknowledgments

This work was partially funded by National Funds through the FCT - Foundation for Science and Technology, in the context of the project UIDB/00127/2020. Furthermore, this work has received funding from the EC under grant agreement 101081813, Genomic Data Infrastructure. J.M.S. acknowledges the FCT grant SFRH/BD/141851/2018. National funds funded D.P. through FCT – Fundação para a Ciência e a Tecnologia, I.P., under the Scientific Employment Stimulus - Institutional Call - reference CEECINST/00026/2018.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jorge Miguel Silva .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Silva, J.M., Pratas, D., Matos, S. (2023). Exploring Kolmogorov Complexity Approximations for Data Analysis: Insights and Applications. In: Camarinha-Matos, L.M., Ferrada, F. (eds) Technological Innovation for Connected Cyber Physical Spaces. DoCEIS 2023. IFIP Advances in Information and Communication Technology, vol 678. Springer, Cham. https://doi.org/10.1007/978-3-031-36007-7_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-36007-7_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-36006-0

  • Online ISBN: 978-3-031-36007-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics