Abstract
Identification of a blogger’s interest is usually solved as a classification problem of a sequence of his/her blog entries. In constructing a blog entry classifier, we need as training data a rather large set of blog entries that are manually labeled with a class label. In contrast, we can easily obtain a set of blog sites with class labels. In this paper, we present a method for constructing a blog entry classifier using only a set of blog sites with class labels. Our method is based on the Naive Bayes classifier coupled with the EM algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Brooks, C.H., Montanez, N.: Improved annotation of the blogosphere via autotagging and hierarchical clustering. In: Proc. of the 15th International World Wide Web Conference, pp. 625–632 (2006)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B 39(1), 1–38 (1977)
Hofmann, T., Puzicha, J.: Statistical models for co-occurrence data. Technical Report AIM-1625, Artifical Intelligence Laboratory, Massachusetts Institute of Technology (1998), citeseer.nj.nec.com/hofmann98statistical.html
Ikeda, D., Takamura, H., Okumura, M.: Semi-supervised learning for blog classification. In: Proc. of the Twenty-Third AAAI Conference on Artificial Intelligence (AAAI 2008), pp. 1156–1161 (2008)
Kumar, R., Novak, J., Raghavan, P., Tomkins, A.: On the bursty evolution of blogspace. In: Proc. of the 12th International World Wide Web Conference, pp. 568–576 (2003)
McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: Proceedings of AAAI 1998 Workshop on Learning for Text Categorization, pp. 41–48 (1998)
Mishne, G.: Autotag: A collaborative approach to automated tag assignment for weblog posts. In: Proc. of the 15th International World Wide Web Conference, pp. 953–954 (2006)
Ni, X., Wu, X., Yu, Y.: Automated identification of chinese weblogger’s interests based on text classification. In: Proc. of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006), pp. 247–253 (2006)
Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39(2/3), 103–134 (2000)
Ohkura, T., Kiyota, Y., Nakagawa, H.: Browsing system for weblog articles based on automated folksonomy. In: Proc. of the WWW 2006 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics (2006)
Teng, C.Y., Chen, H.H.: Detection of bloggers’ interests: Using textual, temporal, and interactive features. In: Proc. of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006), pp. 366–369 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hagiwara, K., Takamura, H., Okumura, M. (2010). Constructing Blog Entry Classifiers Using Blog-Level Topic Labels. In: Cheng, PJ., Kan, MY., Lam, W., Nakov, P. (eds) Information Retrieval Technology. AIRS 2010. Lecture Notes in Computer Science, vol 6458. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17187-1_35
Download citation
DOI: https://doi.org/10.1007/978-3-642-17187-1_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17186-4
Online ISBN: 978-3-642-17187-1
eBook Packages: Computer ScienceComputer Science (R0)