Skip to main content

Application of Duality Properties of Renyi Entropy for Parameter Tuning in an Unsupervised Machine Learning Task

  • Conference paper
  • First Online:
Applied Informatics (ICAI 2022)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1643))

Included in the following conference series:

  • 374 Accesses

Abstract

This work demonstrates the possibility of applying the duality properties of a statistical collection of texts to determine the optimal number of topics/clusters. In a series of numerical experiments on text data, it was demonstrated that Renyi entropy of topic models, expressed in \(S_q\) form (based on the escort distribution), as a function of the number of topics, is the most effective in terms of determining the optimal number of topics. At the same time, \(S_{2-Q}\) and \(S_{1/q}\) forms are not suitable for determining the number of topics.

The results of the project “Modeling the structure and socio-psychological factors of news perception”, carried out within the framework of the Basic Research Program at the National Research University Higher School of Economics (HSE University) in 2022, are presented in this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Aggarwal, C.C.: Machine Learning for Text, 1st edn. Springer, Heidelberg (2018). https://doi.org/10.1007/978-3-319-73531-3

    Book  MATH  Google Scholar 

  2. Baldovin, F., Robledo, A.: Nonextensive Pesin identity: exact renormalization group analytical results for the dynamics at the edge of chaos of the logistic map. Phys. Rev. E 69, 045202 (2004). https://doi.org/10.1103/PhysRevE.69.045202

    Article  MathSciNet  Google Scholar 

  3. Basu, S., Davidson, I., Wagstaff, K. (eds.): Constrained Clustering: Advances in Algorithms, Theory, and Applications. Chapman & Hall/CRC Data Mining and Knowledge Discovery Series, 1st edn. Taylor & Francis Group, Boca Raton (2008)

    Google Scholar 

  4. Beck, C.: Generalised information and entropy measures in physics. Contemp. Phys. 50(4), 495–510 (2009)

    Article  Google Scholar 

  5. Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics, Springer, Heidelberg (2006)

    MATH  Google Scholar 

  6. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). https://doi.org/10.1162/jmlr.2003.3.4-5.993

    Article  MATH  Google Scholar 

  7. Bodrunova, S., Koltsov, S., Koltsova, O., Nikolenko, S., Shimorina, A.: Interval semi-supervised LDA: classifying needles in a haystack. In: Castro, F., Gelbukh, A., González, M. (eds.) MICAI 2013. LNCS (LNAI), vol. 8265, pp. 265–274. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-45114-0_21

    Chapter  Google Scholar 

  8. Chernyavsky, I., Alexandrov, T., Maass, P., Nikolenko, S.I.: A two-step soft segmentation procedure for maldi imaging mass spectrometry data. In: GCB, pp. 39–48 (2012)

    Google Scholar 

  9. Daud, A., Li, J., Zhou, L., Muhammad, F.: Knowledge discovery through directed probabilistic topic models: a survey. Front. Comput. Sci. China 4(2), 280–301 (2010). https://doi.org/10.1007/s11704-009-0062-y

    Article  Google Scholar 

  10. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. The MIT Press, Cambridge (2016)

    MATH  Google Scholar 

  11. Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. 101(Supplement 1), 5228–5235 (2004). https://doi.org/10.1073/pnas.0307752101

    Article  Google Scholar 

  12. Hanel, R., Thurner, S., Gell-Mann, M.: Generalized entropies and logarithms and their duality relations. Proc. Natl. Acad. Sci. 109(47), 19151–19154 (2012). https://doi.org/10.1073/pnas.1216885109

    Article  MathSciNet  Google Scholar 

  13. Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1999, pp. 50–57. Association for Computing Machinery, New York (1999). https://doi.org/10.1145/312624.312649

  14. Jeldtoft Jensen, H., Tempesta, P.: Group entropies: from phase space geometry to entropy functionals via group theory. Entropy 20(10), 804 (2018). https://doi.org/10.3390/e20100804

    Article  MathSciNet  Google Scholar 

  15. Klimontovich, Y.L.: Problems in the statistical theory of open systems: criteria for the relative degree of order in self-organization processes. Sov. Phys. Uspekhi 32(5), 416 (1989)

    Article  MathSciNet  Google Scholar 

  16. Koltcov, S.: Application of Rényi and Tsallis entropies to topic modeling optimization. Phys. A 512, 1192–1204 (2018). https://doi.org/10.1016/j.physa.2018.08.050

    Article  Google Scholar 

  17. Koltcov, S., Ignatenko, V., Boukhers, Z., Staab, S.: Analyzing the influence of hyper-parameters and regularizers of topic modeling in terms of Renyi entropy. Entropy 22(4), 394 (2020). https://doi.org/10.3390/e22040394

    Article  Google Scholar 

  18. Koltcov, S., Ignatenko, V., Koltsova, O.: Estimating topic modeling performance with Sharma-Mittal entropy. Entropy 21(7), 660 (2019). https://doi.org/10.3390/e21070660

    Article  Google Scholar 

  19. Koltcov, S., Ignatenko, V., Terpilovskii, M., Rosso, P.: Analysis and tuning of hierarchical topic models based on Renyi entropy approach (2021)

    Google Scholar 

  20. Koltcov, S., Nikolenko, S.I., Koltsova, O., Bodrunova, S.: Stable topic modeling for web science: Granulated LDA. In: Proceedings of the 8th ACM Conference on Web Science, WebSci 2016, pp. 342–343. ACM (2016). https://doi.org/10.1145/2908131.2908184

  21. Lee, K., Kim, S., Lim, S., Choi, S., Oh, S.: Tsallis reinforcement learning: a unified framework for maximum entropy reinforcement learning (2019)

    Google Scholar 

  22. Lesche, B.: Instabilities of rényi entropies. J. Stat. Phys. 27, 419–422 (1982)

    Article  MathSciNet  Google Scholar 

  23. Lima, C.F.L., de Assis, F.M., de Souza, C.P.: A comparative study of use of Shannon, Rényi and Tsallis entropy for attribute selecting in network intrusion detection. In: Yin, H., Costa, J.A.F., Barreto, G. (eds.) IDEAL 2012. LNCS, vol. 7435, pp. 492–501. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32639-4_60

    Chapter  Google Scholar 

  24. Misra, H., Goyal, A.K., Jose, J.M.: Topic modeling for content based image retrieval. In: Swamy, P.P., Guru, D.S. (eds.) Multimedia Processing, Communication and Computing Applications. Lecture Notes in Electrical Engineering, vol. 213, pp. 63–76. Springer, New Delhi (2013). https://doi.org/10.1007/978-81-322-1143-3_6

    Chapter  Google Scholar 

  25. Mora, T., Walczak, A.M.: Renyi entropy, abundance distribution and the equivalence of ensembles (2016)

    Google Scholar 

  26. Naudts, J.: Generalized thermostatistics based on deformed exponential and logarithmic functions. Phys. A 340(1), 32–40 (2004). https://doi.org/10.1016/j.physa.2004.03.074

    Article  MathSciNet  Google Scholar 

  27. Nikolenko, S.I., Koltcov, S., Koltsova, O.: Topic modelling for qualitative studies. J. Inf. Sci. 43(1), 88–102 (2017). https://doi.org/10.1177/0165551515617393

    Article  Google Scholar 

  28. Oh, S., Baggag, A., Nha, H.: Entropy, free energy, and work of restricted Boltzmann machines. Entropy 22(5), 538 (2020). https://doi.org/10.3390/e22050538

    Article  MathSciNet  Google Scholar 

  29. Palamidessi, C., Romanelli, M.: Feature selection with Rényi min-entropy. In: Pancioni, L., Schwenker, F., Trentin, E. (eds.) ANNPR 2018. LNCS (LNAI), vol. 11081, pp. 226–239. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99978-4_18

    Chapter  Google Scholar 

  30. Steyvers, M., Griffiths, T.: Probabilistic Topic Models, pp. 427–448. Lawrence Erlbaum Associates (2007)

    Google Scholar 

  31. Suyari, H., Wada, T.: Multiplicative duality, Q-triplet and (\(\mu \), \(\nu \), q)-relation derived from the one-to-one correspondence between the (\(\mu \), \(\nu \))-multinomial coefficient and Tsallis entropy \(s_q\). Phys. A 387(1), 71–83 (2008). https://doi.org/10.1016/j.physa.2007.07.074

    Article  MathSciNet  Google Scholar 

  32. Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet processes. J. Am. Stat. Assoc. 101(476), 1566–1581 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  33. Tsallis, C.: Introduction to Nonextensive Statistical Mechanics: Approaching a Complex World. Springer, New York (2009). https://doi.org/10.1007/978-0-387-85359-8

    Book  MATH  Google Scholar 

  34. Venkatesan, R.C., Plastino, A.: Deformed statistics free energy model for source separation using unsupervised learning (2011)

    Google Scholar 

  35. Vorontsov, K.V.: Additive regularization for topic models of text collections. Dokl. Math. 89(3), 301–304 (2014). https://doi.org/10.1134/S1064562414020185

    Article  MathSciNet  MATH  Google Scholar 

  36. Wada, T., Scarfone, A.: Connections between Tsallis’ formalisms employing the standard linear average energy and ones employing the normalized Q-average energy. Phys. Lett. A 335(5), 351–362 (2005). https://doi.org/10.1016/j.physleta.2004.12.054

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergei Koltcov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Koltcov, S. (2022). Application of Duality Properties of Renyi Entropy for Parameter Tuning in an Unsupervised Machine Learning Task. In: Florez, H., Gomez, H. (eds) Applied Informatics. ICAI 2022. Communications in Computer and Information Science, vol 1643. Springer, Cham. https://doi.org/10.1007/978-3-031-19647-8_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19647-8_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19646-1

  • Online ISBN: 978-3-031-19647-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics