Abstract
This work demonstrates the possibility of applying the duality properties of a statistical collection of texts to determine the optimal number of topics/clusters. In a series of numerical experiments on text data, it was demonstrated that Renyi entropy of topic models, expressed in \(S_q\) form (based on the escort distribution), as a function of the number of topics, is the most effective in terms of determining the optimal number of topics. At the same time, \(S_{2-Q}\) and \(S_{1/q}\) forms are not suitable for determining the number of topics.
The results of the project “Modeling the structure and socio-psychological factors of news perception”, carried out within the framework of the Basic Research Program at the National Research University Higher School of Economics (HSE University) in 2022, are presented in this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aggarwal, C.C.: Machine Learning for Text, 1st edn. Springer, Heidelberg (2018). https://doi.org/10.1007/978-3-319-73531-3
Baldovin, F., Robledo, A.: Nonextensive Pesin identity: exact renormalization group analytical results for the dynamics at the edge of chaos of the logistic map. Phys. Rev. E 69, 045202 (2004). https://doi.org/10.1103/PhysRevE.69.045202
Basu, S., Davidson, I., Wagstaff, K. (eds.): Constrained Clustering: Advances in Algorithms, Theory, and Applications. Chapman & Hall/CRC Data Mining and Knowledge Discovery Series, 1st edn. Taylor & Francis Group, Boca Raton (2008)
Beck, C.: Generalised information and entropy measures in physics. Contemp. Phys. 50(4), 495–510 (2009)
Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics, Springer, Heidelberg (2006)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). https://doi.org/10.1162/jmlr.2003.3.4-5.993
Bodrunova, S., Koltsov, S., Koltsova, O., Nikolenko, S., Shimorina, A.: Interval semi-supervised LDA: classifying needles in a haystack. In: Castro, F., Gelbukh, A., González, M. (eds.) MICAI 2013. LNCS (LNAI), vol. 8265, pp. 265–274. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-45114-0_21
Chernyavsky, I., Alexandrov, T., Maass, P., Nikolenko, S.I.: A two-step soft segmentation procedure for maldi imaging mass spectrometry data. In: GCB, pp. 39–48 (2012)
Daud, A., Li, J., Zhou, L., Muhammad, F.: Knowledge discovery through directed probabilistic topic models: a survey. Front. Comput. Sci. China 4(2), 280–301 (2010). https://doi.org/10.1007/s11704-009-0062-y
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. The MIT Press, Cambridge (2016)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. 101(Supplement 1), 5228–5235 (2004). https://doi.org/10.1073/pnas.0307752101
Hanel, R., Thurner, S., Gell-Mann, M.: Generalized entropies and logarithms and their duality relations. Proc. Natl. Acad. Sci. 109(47), 19151–19154 (2012). https://doi.org/10.1073/pnas.1216885109
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1999, pp. 50–57. Association for Computing Machinery, New York (1999). https://doi.org/10.1145/312624.312649
Jeldtoft Jensen, H., Tempesta, P.: Group entropies: from phase space geometry to entropy functionals via group theory. Entropy 20(10), 804 (2018). https://doi.org/10.3390/e20100804
Klimontovich, Y.L.: Problems in the statistical theory of open systems: criteria for the relative degree of order in self-organization processes. Sov. Phys. Uspekhi 32(5), 416 (1989)
Koltcov, S.: Application of Rényi and Tsallis entropies to topic modeling optimization. Phys. A 512, 1192–1204 (2018). https://doi.org/10.1016/j.physa.2018.08.050
Koltcov, S., Ignatenko, V., Boukhers, Z., Staab, S.: Analyzing the influence of hyper-parameters and regularizers of topic modeling in terms of Renyi entropy. Entropy 22(4), 394 (2020). https://doi.org/10.3390/e22040394
Koltcov, S., Ignatenko, V., Koltsova, O.: Estimating topic modeling performance with Sharma-Mittal entropy. Entropy 21(7), 660 (2019). https://doi.org/10.3390/e21070660
Koltcov, S., Ignatenko, V., Terpilovskii, M., Rosso, P.: Analysis and tuning of hierarchical topic models based on Renyi entropy approach (2021)
Koltcov, S., Nikolenko, S.I., Koltsova, O., Bodrunova, S.: Stable topic modeling for web science: Granulated LDA. In: Proceedings of the 8th ACM Conference on Web Science, WebSci 2016, pp. 342–343. ACM (2016). https://doi.org/10.1145/2908131.2908184
Lee, K., Kim, S., Lim, S., Choi, S., Oh, S.: Tsallis reinforcement learning: a unified framework for maximum entropy reinforcement learning (2019)
Lesche, B.: Instabilities of rényi entropies. J. Stat. Phys. 27, 419–422 (1982)
Lima, C.F.L., de Assis, F.M., de Souza, C.P.: A comparative study of use of Shannon, Rényi and Tsallis entropy for attribute selecting in network intrusion detection. In: Yin, H., Costa, J.A.F., Barreto, G. (eds.) IDEAL 2012. LNCS, vol. 7435, pp. 492–501. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32639-4_60
Misra, H., Goyal, A.K., Jose, J.M.: Topic modeling for content based image retrieval. In: Swamy, P.P., Guru, D.S. (eds.) Multimedia Processing, Communication and Computing Applications. Lecture Notes in Electrical Engineering, vol. 213, pp. 63–76. Springer, New Delhi (2013). https://doi.org/10.1007/978-81-322-1143-3_6
Mora, T., Walczak, A.M.: Renyi entropy, abundance distribution and the equivalence of ensembles (2016)
Naudts, J.: Generalized thermostatistics based on deformed exponential and logarithmic functions. Phys. A 340(1), 32–40 (2004). https://doi.org/10.1016/j.physa.2004.03.074
Nikolenko, S.I., Koltcov, S., Koltsova, O.: Topic modelling for qualitative studies. J. Inf. Sci. 43(1), 88–102 (2017). https://doi.org/10.1177/0165551515617393
Oh, S., Baggag, A., Nha, H.: Entropy, free energy, and work of restricted Boltzmann machines. Entropy 22(5), 538 (2020). https://doi.org/10.3390/e22050538
Palamidessi, C., Romanelli, M.: Feature selection with Rényi min-entropy. In: Pancioni, L., Schwenker, F., Trentin, E. (eds.) ANNPR 2018. LNCS (LNAI), vol. 11081, pp. 226–239. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99978-4_18
Steyvers, M., Griffiths, T.: Probabilistic Topic Models, pp. 427–448. Lawrence Erlbaum Associates (2007)
Suyari, H., Wada, T.: Multiplicative duality, Q-triplet and (\(\mu \), \(\nu \), q)-relation derived from the one-to-one correspondence between the (\(\mu \), \(\nu \))-multinomial coefficient and Tsallis entropy \(s_q\). Phys. A 387(1), 71–83 (2008). https://doi.org/10.1016/j.physa.2007.07.074
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet processes. J. Am. Stat. Assoc. 101(476), 1566–1581 (2006)
Tsallis, C.: Introduction to Nonextensive Statistical Mechanics: Approaching a Complex World. Springer, New York (2009). https://doi.org/10.1007/978-0-387-85359-8
Venkatesan, R.C., Plastino, A.: Deformed statistics free energy model for source separation using unsupervised learning (2011)
Vorontsov, K.V.: Additive regularization for topic models of text collections. Dokl. Math. 89(3), 301–304 (2014). https://doi.org/10.1134/S1064562414020185
Wada, T., Scarfone, A.: Connections between Tsallis’ formalisms employing the standard linear average energy and ones employing the normalized Q-average energy. Phys. Lett. A 335(5), 351–362 (2005). https://doi.org/10.1016/j.physleta.2004.12.054
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Koltcov, S. (2022). Application of Duality Properties of Renyi Entropy for Parameter Tuning in an Unsupervised Machine Learning Task. In: Florez, H., Gomez, H. (eds) Applied Informatics. ICAI 2022. Communications in Computer and Information Science, vol 1643. Springer, Cham. https://doi.org/10.1007/978-3-031-19647-8_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-19647-8_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19646-1
Online ISBN: 978-3-031-19647-8
eBook Packages: Computer ScienceComputer Science (R0)