Evaluating Hyperparameter Alpha of LDA Topic Modeling
Contributors
- 1. Universität der Bundeswehr München, Deutschland
- 2. Universität Potsdam, Deutschland
- 3. Digital Humanities im deutschsprachigen Raum e.V., Deutschland
Description
As a quantitative text analytic method, Latent Dirichlet Allocation (LDA) topic modeling has been widely used in Digital Humanities in recent years to explore numerous unstructured text data. When topic modeling is used, one has to deal with many parameters that can influence the result of the modeling such as the hyperparameter Alpha and Beta, topic number, document length, number of iterations of model-updating. The present research has evaluated the influence of hyperparameter Alpha in topic modeling on a newspaper corpus and a literary text corpus from two perspectives, document classification and topic coherence. The results show that one should avoid training topic models with setting Alpha of each topic higher than 1 if one wants to ensure better topic modeling based document classification and more coherent topics.
Ein Beitrag zur 8. Tagung des Verbands "Digital Humanities im deutschsprachigen Raum" - DHd 2022 Kulturen des digitalen Gedächtnisses.
Files
DU_Keli_Evaluating_Hyperparameter_Alpha_of_LDA_Topic_Modelin.pdf
Files
(717.7 kB)
Name | Size | Download all |
---|---|---|
md5:56146695c91f30769d5fdc4b1fddc407
|
694.9 kB | Preview Download |
md5:953f7db0d2d85439f726d5b18dbf5263
|
22.8 kB | Preview Download |
Additional details
Related works
- Is part of
- Book: 10.5281/zenodo.6304590 (DOI)