A Supervised Parameter Estimation Method of LDA

Zhenyan, Liu; Dan, Meng; Weiping, Wang; Chunxia, Zhang

doi:10.1007/978-3-319-25255-1_33

Liu Zhenyan^18,19,20,21,
Meng Dan²⁰,
Wang Weiping²⁰ &
…
Zhang Chunxia²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9313))

Included in the following conference series:

Asia-Pacific Web Conference

2799 Accesses

Abstract

Latent Dirichlet Allocation (LDA) probabilistic topic model is a very effective dimension-reduction tool which can automatically extract latent topics and dedicate to text representation in a lower-dimensional semantic topic space. But the original LDA and its most variants are unsupervised without reference to category label of the documents in the training corpus. And most of them view the terms in vocabulary as equally important, but the weight of each term is different, especially for a skewed corpus in which there are many more samples of some categories than others. As a result, we propose a supervised parameter estimation method based on category and document information which can estimate the parameters of LDA according to term weight. The comparative experiments show that the proposed method is superior for the skewed text classification, which can largely improve the recall and precision of the minority category.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blei, D., Ng, A., Jordan, M.: Latent Dirichlet Allocation. Machine Learning Research 3(3), 993–1022 (2003)
MATH Google Scholar
Xu, G., Wang, H.: The Development of Topic Models in Natural Language Processing. Chinese Journal of Computers 34(8), 1423–1436 (2011) (in Chinese)
Google Scholar
Blei, D., McAuliffe, J.: Supervised topic models. Advances in Neural Information Processing Systems 20, 121–128 (2008)
Google Scholar
Salton, G., Buckley, C.: Term-Weighting Approaches in Automatic Text Retrieval. Journal of Information Processing & Management 24(5), 513–523 (1988)
Article Google Scholar
Madsen, R., Kauchak, D., Elkan, C.: Modeling word burstiness using the dirichlet distribution. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 545–552 (2005)
Google Scholar
Reisinger, J., Waters, A., Silverthorn, B., Mooney, R.: Spherical topic models. In: Proceedings of the 27th International Conference on Machine Learning, pp. 903–910 (2010)
Google Scholar
Zhang, X., Zhou, X., Huang, H., et al.: An improved LDA Topic Model. Journal of Beijing Jiaotong University 34(2), 111–114 (2010) (in Chinese)
Google Scholar
Wilson, A., Chew, P.: Term weighting schemes for latent dirichlet allocation. In: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 465–473 (2010)
Google Scholar
Wu, D., Zhang, Y., Yin, F., Li, M.: Feature Selection Based on Class Distritution Difference and VPRS for Text Classification. Journal of Electronics & Information Technology 29(12), 2880–2884 (2007) (in Chinese)
Google Scholar
Xu, Y., Li, J., Wang, B., Sun, C., Zhang, S.: A Study of Feature Selection for Text Categorization on Imbalanced Data. Journal of Computer Research and Development 44(suppl.), 58–62 (2007) (in Chinese)
Google Scholar
Zhang, A., Jing, H., Wang, B., Xu, Y.: Research on Effects of Term Weighting Factors for Text Categorization. Journal of Chinese Information Processing 24(3), 97–104 (2010) (in Chinese)
Google Scholar
Deerwester, S., Dumais, S., Landauer, T., Furnas, G., Harshman, R.: Indexing by Latent Semantic Analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)
Article Google Scholar
Heinrich, G.: Parameter estimation for text analysis. Technical Note Version 2.9. http://www.arbylon.net/publications/text-est2.pdf (2009)
Steyvers, M., Griffiths, T.: Probabilistic topic models. In: Landauer, T., McNamara, D.S., Dennis, S., Kintsch, W. (eds.) Handbook of Latent Semantic Analysis. Erlbaum, Hillsdale (2007)
Google Scholar
Koller, D., Sahami, M.: Hierarchically classifying documents using very few words. In: Proceedings of the 14th International Conference on Machine Learning, pp. 170–178 (1997)
Google Scholar
Mladenic, D., Grobelnk, M.: Feature selection for unbalanced class distribution and Naïve Bayes. In: Proceeding of the 16th International Conference Machine Learning, pp. 258–267 (1999)
Google Scholar
http://web.ist.utl.pt/~acardoso/datasets/
http://www.csie.ntu.edu.tw/~cjlin/
Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
Liu Zhenyan
University of Chinese Academy of Sciences, Beijing, 100049, China
Liu Zhenyan
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, 100093, China
Liu Zhenyan, Meng Dan & Wang Weiping
School of Software, Beijing Institute of Technology, Beijing, 100081, China
Liu Zhenyan & Zhang Chunxia

Authors

Liu Zhenyan
View author publications
You can also search for this author in PubMed Google Scholar
Meng Dan
View author publications
You can also search for this author in PubMed Google Scholar
Wang Weiping
View author publications
You can also search for this author in PubMed Google Scholar
Zhang Chunxia
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Hong Kong, Hong Kong, China
Reynold Cheng
Computer Science, Peking University, Beijing, China
Bin Cui
Advanced Digital Sciences Center (ADSC), Singapore, Singapore
Zhenjie Zhang
University of Technology, Guangzhou, China
Ruichu Cai
Guangxi University, Guangxi, China
Jia Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhenyan, L., Dan, M., Weiping, W., Chunxia, Z. (2015). A Supervised Parameter Estimation Method of LDA. In: Cheng, R., Cui, B., Zhang, Z., Cai, R., Xu, J. (eds) Web Technologies and Applications. APWeb 2015. Lecture Notes in Computer Science(), vol 9313. Springer, Cham. https://doi.org/10.1007/978-3-319-25255-1_33

Download citation

DOI: https://doi.org/10.1007/978-3-319-25255-1_33
Published: 13 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25254-4
Online ISBN: 978-3-319-25255-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics