Revisiting the Past to Reinvent the Future: Topic Modeling with Single Mode Factorization

Peladeau, Normand

doi:10.1007/978-3-031-08473-7_8

Normand Peladeau ORCID: orcid.org/0000-0001-6988-5124¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13286))

Included in the following conference series:

International Conference on Applications of Natural Language to Information Systems

1430 Accesses
1 Citations

Abstract

This paper proposes reexamining ancestors of modern topic modeling technique that seem to have been forgotten. We present an experiment where results obtained using six contemporary techniques are compared with a factorization technique developed in the early sixties and a contemporary adaptation of it based on non-negative matrix factorization. Results on internal and external coherence as well as topic diversity suggest that extracting topics by applying factorization methods on a word-by-word correlation matrix computed on documents segmented into smaller contextual windows produces topics that are clearly more coherent and show higher diversity than other topic modeling techniques using term-document matrices.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://radimrehurek.com/gensim/.
2.
https://pypi.org/project/bitermplus/.
3.
https://github.com/MIND-Lab/OCTIS.
4.
https://github.com/MilaNLProc/contextualized-topic-models.
5.
https://provalisresearch.com/products/content-analysis-software/.
6.
The two additional datasets are available from https://provalisresearch.com/tm/datasets.zip.
7.
https://github.com/dice-group/Palmetto.

References

Aletras, N., Stevenson, M.: Evaluating topic coherence using distributional semantics. In: IWCS, vol. 13, pp. 13–22 (2013)
Google Scholar
AlSumait, L., Barbará, D., Gentle, J., Domeniconi, C.: Topic significance ranking of LDA generative models. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009. LNCS (LNAI), vol. 5781, pp. 67–82. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04180-8_22
Chapter Google Scholar
Arora, S., Ge, R., Moitra, A.: Learning topic models - going beyond SVD. In: IEEE 53rd Annual Symposium on Foundations of Computer Science, pp. 1–10. IEEE (2012)
Google Scholar
Bianchi, F., Terragni, S., Hovy, D: Pre-training is a hot topic: contextualized document embeddings improve topic coherence. In: 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 759–766. ACL (2021)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Borko, H.: The construction of an empirically based mathematically derived classification system. In: Proceedings of the Spring Joint Computer Conference, vol. 21, pp. 279–289 (1962)
Google Scholar
Borko, H., Bernick, M.: Automatic document classification. J. Assoc. Comput. Mach. 10, 151–162 (1963)
Article Google Scholar
Boyd-Graber, J.L., Hu, Y., Mimno, D.M.: Applications of topic models. Found. Trends Inf. Retrieval 20(20), 1–154 (2017)
Google Scholar
Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J.L., Blei, D.M.: Reading tea leaves: how humans interpret topic models. In: Advances in Neural Information Processing Systems, vol. 22, pp. 288–296 (2009)
Google Scholar
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)
Article Google Scholar
Dieng, A.B., Ruiz, F.J.R., Blei, D.M.: Topic modeling in embedding spaces. Trans. Assoc. Comput. Linguist. 8, 439–453 (2020)
Article Google Scholar
Doogan, C., Buntine, W.: Topic model or topic twaddle? Re-evaluating semantic interpretability measures. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Human Language Technologies, pp. 3824–3848 (2021)
Google Scholar
Hoyle, A., Goel, P., Hian-Cheong, A., Peskov, D., Boyd-Graber, J., Resnik, P.: Is automated topic model evaluation broken? The incoherence of coherence. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Google Scholar
Iker, H.P.: An historical note on the use of word-frequency contiguities in content analysis. Comput. Humanit. 13(2), 93–98 (1974)
Article Google Scholar
Iker, H.P., Harway, N.I.: A computer approach towards the analysis of content. Behav. Sci. 10(2), 173–183 (1965)
Article Google Scholar
Jandt, F.E.: Sources for computer utilization in interpersonal communication instruction and research. Today’s Speech 20(2), 25–31 (1972)
Article Google Scholar
Jelodar, H., et al.: Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools Appl. 78(11), 15169–15211 (2018)
Article Google Scholar
Klein, R.H., Iker, H.P.: The lack of differentiation between male and female in Schreber’s autobiography. J. Abnorm. Psychol. 83(3), 234–239 (1974)
Article Google Scholar
Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: Proceedings of the 2011 EMNLP Conference, pp. 262–272. ACL (2011)
Google Scholar
Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence. In: Human Language Technologies: 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 100–108. ACL (2010)
Google Scholar
Paur, J.: Boeing’s 787 is as innovative inside and outside. Wired. Conde Nast, 24 December 2009
Google Scholar
Peladeau, N., Davoodi, E.: Comparison of latent Dirichlet modeling and factor analysis for topic extraction: a lesson of history. In: 51st Hawaii International Conference on System Sciences (HICSS), pp. 615–623. IEEE (2018)
Google Scholar
Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence measures. In: Proceedings of the 8th ACM International Conference on Web Search and Data Mining. pp. 399–408 (2015)
Google Scholar
Sainte-Marie, P., Robillard, P., Bratley, P.: An application of principal components analysis to the works of Molière. Comput. Humanit. 7(3), 131–137 (1973)
Article Google Scholar
Sowa, C.A., Sowa, J.F.: Thought clusters in early Greek oral poetry. Comput. Humanit. 8(3), 131–146 (1972)
Article Google Scholar
Srivastava, A., Sutton, C.: Autoencoding variational inference for topic models. In: Proceeding of the 5th International Conference on Learning Representations (2017)
Google Scholar
Yan, X., Guo, J., Liu, S., Cheng, X., Wang, Y. Learning topics in short texts by non-negative matrix factorization on term correlation matrix. In: SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Provalis Research, Montreal, Canada
Normand Peladeau

Authors

Normand Peladeau
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Normand Peladeau .

Editor information

Editors and Affiliations

Universitat Politècnica de València, Valencia, Spain
Paolo Rosso
University of Turin, Torino, Italy
Valerio Basile
Universidad Nacional de Educación a Distancia, Madrid, Spain
Raquel Martínez
Conservatoire National des Arts et Métiers, Paris, France
Elisabeth Métais
University of Derby, Derby, UK
Farid Meziane

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Peladeau, N. (2022). Revisiting the Past to Reinvent the Future: Topic Modeling with Single Mode Factorization. In: Rosso, P., Basile, V., Martínez, R., Métais, E., Meziane, F. (eds) Natural Language Processing and Information Systems. NLDB 2022. Lecture Notes in Computer Science, vol 13286. Springer, Cham. https://doi.org/10.1007/978-3-031-08473-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-08473-7_8
Published: 13 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08472-0
Online ISBN: 978-3-031-08473-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics