Unsupervised Text Learning Based on Context Mixture Model with Dirichlet Prior

Chen, Dongling; Wang, Daling; Yu, Ge

doi:10.1007/978-3-540-89376-9_17

Dongling Chen^21,22,
Daling Wang²¹ &
Ge Yu²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4977))

Included in the following conference series:

Asia-Pacific Web Conference

888 Accesses

Abstract

In this paper, we proposed a bayesian mixture model, in which introduce a context variable, which has Dirichlet prior, in a bayesian framework to model text multiple topics and then clustering. It is a novel unsupervised text learning algorithm to cluster large-scale web data. In addition, parameters estimation we adopt Maximum Likelihood (ML) and EM algorithm to estimate the model parameters, and employed BIC principle to determine the number of clusters. Experimental results show that method we proposed distinctly outperformed baseline algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Liu, B., Lee, W.s., et al.: Partially supervised classification of text documents. In: Proc 19th Internet Conf. Machine Learning (ICML), pp. 387–394 (2002)
Google Scholar
Liu, B., Dai, Y., et al.: Building text classifiers using positive and unlabeled examples. In: 3 rd IEEE Internet Conf. Data Mining(ICDM) (2003)
Google Scholar
Bishop, C.M.: Pattern recognition and machine learning. Springer, Heidelberg (2006)
MATH Google Scholar
Sandler, M.: Hierarchical mixture models: a probabilistic analysis. In: Proc. of KDD 2007 (2007)
Google Scholar
Sandler, M.: On the use of linear programming for unsupervised text classification. In: Proc. of KDD 2005 (2005)
Google Scholar
Blei, D.M., Ng, A., Jordan, M.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
MATH Google Scholar
Cheeseman, P., Stutz, J.: Bayesian classification (autoclass): theory and results[A]. In: Fayyad, U., Piatesky Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining[C], pp. 153–180. AAAI Press, Cambridge (1995)
Google Scholar
Calvo, B., Larranaga, P., Lozano, J.A.: Learning Bayesian classifiers from positive and unlabeled examples. Pattern Recognition Letters (2007)
Google Scholar
McLachlan, G., Basfor, K.: Mixture models, inference and applications to clustering. Marcel Dekker (1987)
Google Scholar
Heller, K., Ghahramani, Z.: Bayesian hierarchical clustering. In: ICML (2005)
Google Scholar
McCallum, A., Nigam, K.: A comparson of event models for naïve bayes text classification. In: AAAI, workshop on learning for text categorization (1998)
Google Scholar
Hofmann, T.: Probabilistic Latent Semantic Analysis. In: The 22nd Annual ACM Conference on Research and Development in Information Retrieval, Berkeley, California, pp. 50–57. ACM Press, New York (1999)
Google Scholar
Liu, X., Gong, Y., Xu, W., Zhu, S.: Document clustering with cluster refinement and model selection capabilities. In: Proc. of SIGIR 2002, pp. 191–198 (2002)
Google Scholar
Li, H., Yamanishi, K.: Topic analysis using a finite mixture model. Information Processing and Management 39/4, 521–541 (2003)
Article Google Scholar
Steyvers, M., Smyth, P., Rosen-Zvi, M., Griffiths, T.: Probabilistic author-topic models for information discovery. In: Proc. of KDD 2004, pp. 306–315 (2004)
Google Scholar
Yu, Y., Xu, Q.F., Sun, P.F.: Bayesian clulstering based on fiinite mixture models of dirichlet distribution. [J] Mathematica Applicata 19(3), 600–605 (2006)
MATH MathSciNet Google Scholar
Mei, Q.Z., Ling, X., Wondra, M., et al.: Topic Sentiment Mixture: Modeling facets and opinions in weblogs. In: WWW 2007. ACM Press, Canada (2007)
Google Scholar
Fraley, C., Raftery, A.E.: Model-based clustering discriminant analysis and density estimation[J]. Journal of the American Statistical Association 97, 611–631 (2002)
Article MATH MathSciNet Google Scholar
McLachlan, G.J., Peel, D.: Finite Mixture Models[M]. Wiley, New York (2000)
Book Google Scholar
Basu, S., Bilenko, M., Mooney, R.J.: A probabilistic framework for semi-supervised clustering. In: Proc. of KDD 2004 (2004)
Google Scholar
Huang, Y.F., Mitchell, T.M.: Text clustering with extended user feedback. In: Proc. of SIGIR 2006, ACM Press, Seattle, Washington USA (2006)
Google Scholar
Dom, B.: An information-theoretic external cluster-validity measure. Technical Report RJ 10219, IBM (2001)
Google Scholar
McCallum, A., Corrada-Emmanuel, A., Wang, X.: topic and role discovery in social networks. In: IJCAI -19, pp. 786–791 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Northeastern University, Shenyang, 110004, P.R. China
Dongling Chen, Daling Wang & Ge Yu
School of Information, Shenyang University, Shenyang, 110044, P.R. China
Dongling Chen

Authors

Dongling Chen
View author publications
You can also search for this author in PubMed Google Scholar
Daling Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ge Yu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Nagoya University, Nagoya, Japan
Yoshiharu Ishikawa
CAS Research Center on Data Technology and Knowledge Economy, Beijing, China
Jing He & Yong Shi &
Victoria University, Melbourne, Australia
Guandong Xu
Institute of Software, Chinese Academy of Sciences, Beijing, China
Guangyan Huang
CSIRO ICT Centre, Brisbane, QLD, Australia
Chaoyi Pang & Qing Zhang &
Northeastern University, Shenyang, China
Guoren Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, D., Wang, D., Yu, G. (2008). Unsupervised Text Learning Based on Context Mixture Model with Dirichlet Prior. In: Ishikawa, Y., et al. Advanced Web and Network Technologies, and Applications. APWeb 2008. Lecture Notes in Computer Science, vol 4977. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89376-9_17

Download citation

DOI: https://doi.org/10.1007/978-3-540-89376-9_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89375-2
Online ISBN: 978-3-540-89376-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics