Abstract
In this paper, we proposed a bayesian mixture model, in which introduce a context variable, which has Dirichlet prior, in a bayesian framework to model text multiple topics and then clustering. It is a novel unsupervised text learning algorithm to cluster large-scale web data. In addition, parameters estimation we adopt Maximum Likelihood (ML) and EM algorithm to estimate the model parameters, and employed BIC principle to determine the number of clusters. Experimental results show that method we proposed distinctly outperformed baseline algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Liu, B., Lee, W.s., et al.: Partially supervised classification of text documents. In: Proc 19th Internet Conf. Machine Learning (ICML), pp. 387–394 (2002)
Liu, B., Dai, Y., et al.: Building text classifiers using positive and unlabeled examples. In: 3 rd IEEE Internet Conf. Data Mining(ICDM) (2003)
Bishop, C.M.: Pattern recognition and machine learning. Springer, Heidelberg (2006)
Sandler, M.: Hierarchical mixture models: a probabilistic analysis. In: Proc. of KDD 2007 (2007)
Sandler, M.: On the use of linear programming for unsupervised text classification. In: Proc. of KDD 2005 (2005)
Blei, D.M., Ng, A., Jordan, M.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Cheeseman, P., Stutz, J.: Bayesian classification (autoclass): theory and results[A]. In: Fayyad, U., Piatesky Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining[C], pp. 153–180. AAAI Press, Cambridge (1995)
Calvo, B., Larranaga, P., Lozano, J.A.: Learning Bayesian classifiers from positive and unlabeled examples. Pattern Recognition Letters (2007)
McLachlan, G., Basfor, K.: Mixture models, inference and applications to clustering. Marcel Dekker (1987)
Heller, K., Ghahramani, Z.: Bayesian hierarchical clustering. In: ICML (2005)
McCallum, A., Nigam, K.: A comparson of event models for naïve bayes text classification. In: AAAI, workshop on learning for text categorization (1998)
Hofmann, T.: Probabilistic Latent Semantic Analysis. In: The 22nd Annual ACM Conference on Research and Development in Information Retrieval, Berkeley, California, pp. 50–57. ACM Press, New York (1999)
Liu, X., Gong, Y., Xu, W., Zhu, S.: Document clustering with cluster refinement and model selection capabilities. In: Proc. of SIGIR 2002, pp. 191–198 (2002)
Li, H., Yamanishi, K.: Topic analysis using a finite mixture model. Information Processing and Management 39/4, 521–541 (2003)
Steyvers, M., Smyth, P., Rosen-Zvi, M., Griffiths, T.: Probabilistic author-topic models for information discovery. In: Proc. of KDD 2004, pp. 306–315 (2004)
Yu, Y., Xu, Q.F., Sun, P.F.: Bayesian clulstering based on fiinite mixture models of dirichlet distribution. [J] Mathematica Applicata 19(3), 600–605 (2006)
Mei, Q.Z., Ling, X., Wondra, M., et al.: Topic Sentiment Mixture: Modeling facets and opinions in weblogs. In: WWW 2007. ACM Press, Canada (2007)
Fraley, C., Raftery, A.E.: Model-based clustering discriminant analysis and density estimation[J]. Journal of the American Statistical Association 97, 611–631 (2002)
McLachlan, G.J., Peel, D.: Finite Mixture Models[M]. Wiley, New York (2000)
Basu, S., Bilenko, M., Mooney, R.J.: A probabilistic framework for semi-supervised clustering. In: Proc. of KDD 2004 (2004)
Huang, Y.F., Mitchell, T.M.: Text clustering with extended user feedback. In: Proc. of SIGIR 2006, ACM Press, Seattle, Washington USA (2006)
Dom, B.: An information-theoretic external cluster-validity measure. Technical Report RJ 10219, IBM (2001)
McCallum, A., Corrada-Emmanuel, A., Wang, X.: topic and role discovery in social networks. In: IJCAI -19, pp. 786–791 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, D., Wang, D., Yu, G. (2008). Unsupervised Text Learning Based on Context Mixture Model with Dirichlet Prior. In: Ishikawa, Y., et al. Advanced Web and Network Technologies, and Applications. APWeb 2008. Lecture Notes in Computer Science, vol 4977. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89376-9_17
Download citation
DOI: https://doi.org/10.1007/978-3-540-89376-9_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89375-2
Online ISBN: 978-3-540-89376-9
eBook Packages: Computer ScienceComputer Science (R0)