Abstract
The paper addresses a problem of tuning topic models with additive regularization by introducing a novel hybrid evolutionary approach that combines Genetic and Nelder-Mead algorithms to generate domain-specific topic models with better quality. Introducing Nelder-Mead into the Genetic Algorithm pursues the goal of enhancing exploitation capabilities of the resulting hybrid algorithm with improved local search. The conducted experimental study performed on several datasets on Russian and English languages shows noticeable increase in quality of the obtained topic models. Moreover, the experiments demonstrate that the proposed modification also improves the convergence dynamics of the tuning procedure, leading to a stable increases in quality from generation to generation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Shi, T., Kang, K., Choo, J., Reddy, C.K.: Short-text topic modeling via non-negative matrix factorization enriched with local word-context correlations. In: Proceedings of the 2018 World Wide Web Conference, WWW 2018, Republic and Canton of Geneva, CHE, pp. 1105–1114. International World Wide Web Conferences Steering Committee (2018)
Wang, Q., Xu, J., Li, H., Craswell, N.: Regularized latent semantic indexing: a new approach to large-scale topic modeling. ACM Trans. Inf. Syst. 31, 5:1–5:44 (2013)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1999, pp. 50–57. Association for Computing Machinery, New York (1999)
Rezaee, M., Ferraro, F.: A discrete variational recurrent topic model without the reparametrization trick. In: Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS 2020, Red Hook, NY, USA. Curran Associates Inc. (2020)
Grootendorst, M.: BERTopic: neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794 (2022)
Vorontsov, K., Frei, O., Apishev, M., Romov, P., Dudarenko, M.: BigARTM: open source library for regularized multimodal topic modeling of large collections. In: Khachay, M.Y., Konstantinova, N., Panchenko, A., Ignatov, D.I., Labunets, V.G. (eds.) AIST 2015. CCIS, vol. 542, pp. 370–381. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26123-2_36
Rieger, J., Jentsch, C., Rahnenführer, J.: RollingLDA: an update algorithm of Latent Dirichlet Allocation to construct consistent time series from textual data. In: Findings of the Association for Computational Linguistics: EMNLP 2021, Punta Cana, Dominican Republic, November 2021, pp. 2337–2347. Association for Computational Linguistics (2021)
Bulatov, V., et al.: TopicNet: making additive regularisation for topic modelling accessible. In: LREC (2020)
Khodorchenko, M., Teryoshkin, S., Sokhin, T., Butakov, N.: Optimization of learning strategies for ARTM-based topic models. In: de la Cal, E.A., Villar Flecha, J.R., Quintián, H., Corchado, E. (eds.) HAIS 2020. LNCS (LNAI), vol. 12344, pp. 284–296. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61705-9_24
Khodorchenko, M., Butakov, N., Sokhin, T., Teryoshkin, S.: Surrogate-based optimization of learning strategies for additively regularized topic models. Log. J. IGPL 31(2), 287–299 (2023)
Pavlenko, A., Chivilikhin, D., Semenov, A.: Asynchronous evolutionary algorithm for finding backdoors in Boolean satisfiability. In: 2022 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8. IEEE (2022)
Butakov, N., Nasonov, D.: Co-evolutional genetic algorithm for workflow scheduling in heterogeneous distributed environment. In: 2014 IEEE 8th International Conference on Application of Information and Communication Technologies (AICT), pp. 1–5. IEEE (2014)
Singer, S., Nelder, J.: Nelder-Mead algorithm. Scholarpedia 4(7), 2928 (2009)
Takenaga, S., Ozaki, Y., Onishi, M.: Practical initialization of the Nelder-Mead method for computationally expensive optimization problems. Optim. Lett. 17(2), 283–297 (2023). https://doi.org/10.1007/s11590-022-01953-y
Vorontsov, K., Frei, O., Apishev, M., Romov, P., Suvorova, M., Yanina, A.: Non-Bayesian additive regularization for multimodal topic modeling of large collections. In: Proceedings of the 2015 Workshop on Topic Models: Post-Processing and Applications, pp. 29–37 (2015)
Řehřek, R., Sojka, P.: Software framework for topic modelling with large corpora (2010)
Terragni, S., Fersini, E., Galuzzi, B.G., Tropeano, P., Candelieri, A.: OCTIS: comparing and optimizing topic models is simple! In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pp. 263–270 (2021)
Katoch, S., Chauhan, S.S., Kumar, V.: A review on genetic algorithm: past, present, and future. Multimed. Tools Appl. 80, 8091–8126 (2021). https://doi.org/10.1007/s11042-020-10139-6
Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of ICNN 1995-International Conference on Neural Networks, vol. 4, pp. 1942–1948. IEEE (1995)
Pelikan, M., Goldberg, D.E., Cantú-Paz, E., et al.: BOA: the Bayesian optimization algorithm. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO-1999, vol. 1, pp. 525–532. Citeseer (1999)
Khodorchenko, M., Butakov, N.: Developing an approach for lifestyle identification based on explicit and implicit features from social media. Procedia Comput. Sci. 136, 236–245 (2018). 7th International Young Scientists Conference on Computational Science, YSC 2018, Heraklion, Greece, 02–06 July 2018
Khodorchenko, M., Butakov, N., Nasonov, D.: Towards better evaluation of topic model quality. In: 2022 32nd Conference of Open Innovations Association (FRUCT), pp. 128–134. IEEE (2022)
Lund, J., et al.: Automatic evaluation of local topic quality. arXiv preprint arXiv:1905.13126 (2019)
Doogan, C., Buntine, W.: Topic model or topic twaddle? Re-evaluating semantic interpretability measures. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 3824–3848. Association for Computational Linguistics (2021)
Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the ACL, California, pp. 100–108. ACL (2010)
Lang, K.: NewsWeeder: learning to filter netnews. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 331–339 (1995)
Yutkin, D.: Corpus of Russian news articles collected from lenta.ru (2018)
McAuley, J.J., Leskovec, J.: From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews. In: Proceedings of the 22nd International Conference on World Wide Web, WWW 2013, pp. 897–908. Association for Computing Machinery, New York (2013)
Datafiniti’s Business Database: Datafiniti’s hotel reviews
Nevezhin, E., Butakov, N., Khodorchenko, M., Petrov, M., Nasonov, D.A.: Topic-driven ensemble for online advertising generation. In: COLING (2020)
Acknowledgements
This research is financially supported by the Russian Science Foundation, Agreement 17-71-30029, with co-financing of Bank Saint Petersburg.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Khodorchenko, M., Butakov, N., Nasonov, D. (2023). Improved Evolutionary Approach for Tuning Topic Models with Additive Regularization. In: García Bringas, P., et al. Hybrid Artificial Intelligent Systems. HAIS 2023. Lecture Notes in Computer Science(), vol 14001. Springer, Cham. https://doi.org/10.1007/978-3-031-40725-3_35
Download citation
DOI: https://doi.org/10.1007/978-3-031-40725-3_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40724-6
Online ISBN: 978-3-031-40725-3
eBook Packages: Computer ScienceComputer Science (R0)