Skip to main content

Unsupervised Text Learning Based on Context Mixture Model with Dirichlet Prior

  • Conference paper
Advanced Web and Network Technologies, and Applications (APWeb 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4977))

Included in the following conference series:

  • 888 Accesses

Abstract

In this paper, we proposed a bayesian mixture model, in which introduce a context variable, which has Dirichlet prior, in a bayesian framework to model text multiple topics and then clustering. It is a novel unsupervised text learning algorithm to cluster large-scale web data. In addition, parameters estimation we adopt Maximum Likelihood (ML) and EM algorithm to estimate the model parameters, and employed BIC principle to determine the number of clusters. Experimental results show that method we proposed distinctly outperformed baseline algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Liu, B., Lee, W.s., et al.: Partially supervised classification of text documents. In: Proc 19th Internet Conf. Machine Learning (ICML), pp. 387–394 (2002)

    Google Scholar 

  2. Liu, B., Dai, Y., et al.: Building text classifiers using positive and unlabeled examples. In: 3 rd IEEE Internet Conf. Data Mining(ICDM) (2003)

    Google Scholar 

  3. Bishop, C.M.: Pattern recognition and machine learning. Springer, Heidelberg (2006)

    MATH  Google Scholar 

  4. Sandler, M.: Hierarchical mixture models: a probabilistic analysis. In: Proc. of KDD 2007 (2007)

    Google Scholar 

  5. Sandler, M.: On the use of linear programming for unsupervised text classification. In: Proc. of KDD 2005 (2005)

    Google Scholar 

  6. Blei, D.M., Ng, A., Jordan, M.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)

    MATH  Google Scholar 

  7. Cheeseman, P., Stutz, J.: Bayesian classification (autoclass): theory and results[A]. In: Fayyad, U., Piatesky Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining[C], pp. 153–180. AAAI Press, Cambridge (1995)

    Google Scholar 

  8. Calvo, B., Larranaga, P., Lozano, J.A.: Learning Bayesian classifiers from positive and unlabeled examples. Pattern Recognition Letters (2007)

    Google Scholar 

  9. McLachlan, G., Basfor, K.: Mixture models, inference and applications to clustering. Marcel Dekker (1987)

    Google Scholar 

  10. Heller, K., Ghahramani, Z.: Bayesian hierarchical clustering. In: ICML (2005)

    Google Scholar 

  11. McCallum, A., Nigam, K.: A comparson of event models for naïve bayes text classification. In: AAAI, workshop on learning for text categorization (1998)

    Google Scholar 

  12. Hofmann, T.: Probabilistic Latent Semantic Analysis. In: The 22nd Annual ACM Conference on Research and Development in Information Retrieval, Berkeley, California, pp. 50–57. ACM Press, New York (1999)

    Google Scholar 

  13. Liu, X., Gong, Y., Xu, W., Zhu, S.: Document clustering with cluster refinement and model selection capabilities. In: Proc. of SIGIR 2002, pp. 191–198 (2002)

    Google Scholar 

  14. Li, H., Yamanishi, K.: Topic analysis using a finite mixture model. Information Processing and Management 39/4, 521–541 (2003)

    Article  Google Scholar 

  15. Steyvers, M., Smyth, P., Rosen-Zvi, M., Griffiths, T.: Probabilistic author-topic models for information discovery. In: Proc. of KDD 2004, pp. 306–315 (2004)

    Google Scholar 

  16. Yu, Y., Xu, Q.F., Sun, P.F.: Bayesian clulstering based on fiinite mixture models of dirichlet distribution. [J] Mathematica Applicata 19(3), 600–605 (2006)

    MATH  MathSciNet  Google Scholar 

  17. Mei, Q.Z., Ling, X., Wondra, M., et al.: Topic Sentiment Mixture: Modeling facets and opinions in weblogs. In: WWW 2007. ACM Press, Canada (2007)

    Google Scholar 

  18. Fraley, C., Raftery, A.E.: Model-based clustering discriminant analysis and density estimation[J]. Journal of the American Statistical Association 97, 611–631 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  19. McLachlan, G.J., Peel, D.: Finite Mixture Models[M]. Wiley, New York (2000)

    Book  Google Scholar 

  20. Basu, S., Bilenko, M., Mooney, R.J.: A probabilistic framework for semi-supervised clustering. In: Proc. of KDD 2004 (2004)

    Google Scholar 

  21. Huang, Y.F., Mitchell, T.M.: Text clustering with extended user feedback. In: Proc. of SIGIR 2006, ACM Press, Seattle, Washington USA (2006)

    Google Scholar 

  22. Dom, B.: An information-theoretic external cluster-validity measure. Technical Report RJ 10219, IBM (2001)

    Google Scholar 

  23. McCallum, A., Corrada-Emmanuel, A., Wang, X.: topic and role discovery in social networks. In: IJCAI -19, pp. 786–791 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, D., Wang, D., Yu, G. (2008). Unsupervised Text Learning Based on Context Mixture Model with Dirichlet Prior. In: Ishikawa, Y., et al. Advanced Web and Network Technologies, and Applications. APWeb 2008. Lecture Notes in Computer Science, vol 4977. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89376-9_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-89376-9_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-89375-2

  • Online ISBN: 978-3-540-89376-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics