Cool Blog Classification from Positive and Unlabeled Examples

Sriphaew, Kritsada; Takamura, Hiroya; Okumura, Manabu

doi:10.1007/978-3-642-01307-2_9

Kritsada Sriphaew²³,
Hiroya Takamura²³ &
Manabu Okumura²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5476))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2406 Accesses
2 Citations

Abstract

We address the problem of cool blog classification using only positive and unlabeled examples. We propose an algorithm, called PUB, that exploits the information of unlabeled data together with the positive examples to predict whether the unseen blogs are cool or not. The algorithm uses the weighting technique to assign a weight to each unlabeled example which is assumed to be negative in the training set, and the bagging technique to obtain several weak classifiers, each of which is learned on a small training set generated by randomly sampling some positive examples and some unlabeled examples, which are assumed to be negative. Each of the weak classifiers must achieve admissible performance measure evaluated based on the whole labeled positive examples or has the best performance measure within iteration limit. The majority voting function on all weak classifiers is employed to predict the class of a test instance. The experimental results show that PUB can correctly predict the classes of unseen blogs where this situation cannot be handled by the traditional learning from positive and negative examples. The results also show that PUB outperforms other algorithms for learning from positive and unlabeled examples in the task of cool blog classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sriphaew, K., Takamura, H., Okumura, M.: Cool blog identification using topic-based models. In: Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence, Sydney, Australia. IEEE, Los Alamitos (2008)
Google Scholar
Rubin, V.L., Liddy, E.D.: Assessing credibility of weblogs. In: Proceedings of the AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs (CAAW) (2006)
Google Scholar
Weerkamp, W., de Rijke, M.: Credibility improves topical blog post retrieval. In: Proceedings of ACL 2008: HLT, Columbus, Ohio, Association for Computational Linguistics, pp. 923–931 (2008)
Google Scholar
Billsus, D., Pazzani, M.J.: Learning and revising user profiles: The identification of interesting web sites. Machine Learning 27, 313–331 (1997)
Article Google Scholar
Pazzani, M.J., Muramatsu, J., Billsus, D.: Syskill & webert: Identifying interesting web sites. In: 13th National Conference on Artificial Intelligence, Portland, OR, US, vol. 1, pp. 54–61 (1996)
Google Scholar
Denis, F.: Pac learning from positive statistical queries. In: Richter, M.M., Smith, C.H., Wiehagen, R., Zeugmann, T. (eds.) ALT 1998. LNCS, vol. 1501, pp. 112–126. Springer, Heidelberg (1998)
Chapter Google Scholar
Zhang, B., Zuo, W.: Learning from positive and unlabeled examples: A survey. In: Proceedings of the International Symposiums on Information Processing, pp. 650–654 (2008)
Google Scholar
Liu, B., Lee, W.S., Yu, P.S., Li, X.: Partially supervised classification of text documents. In: ICML, pp. 387–394 (2002)
Google Scholar
Liu, B., Dai, Y., Li, X., Lee, W.S., Yu, P.S.: Building text classifiers using positive and unlabeled examples. In: ICDM 2003: Proceedings of the Third IEEE International Conference on Data Mining, p. 179. IEEE Computer Society Press, Washington (2003)
Google Scholar
Li, X., Liu, B.: Learning to classify texts using positive and unlabeled data. In: IJCAI 2003: Proceedings of Eighteenth International Joint Conference on Artifical Intelligence, pp. 587–594 (2003)
Google Scholar
Yu, H., Han, J., Chang, K.C.: Pebl: Web page classification without negative examples. IEEE Trans. on Knowl. and Data Eng. 16(1), 70–81 (2004)
Article Google Scholar
Denis, F., Gilleron, R., Tommasi, M.: Text classification from positive and unlabeled examples. In: IPMU 2002, 9th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, pp. 1927–1934 (2002)
Google Scholar
Denis, F., Gilleron, R., Laurent, A., Tommasi, M.: Text classification and co-training from positive and unlabeled examples. In: Proceedings of the ICML Workshop: the Continuum from Labeled Data to Unlabeled Data in Machine Learning and Data Mining, pp. 80–87 (2003)
Google Scholar
Lee, W.S., Liu, B.: Learning with positive and unlabeled examples using weighted logistic regression. In: ICML 2003: Proceedings of the Twentieth International Conference on Machine Learning (2003)
Google Scholar
Zhang, D., Lee, W.S.: A simple probabilistic approach to learning from positive and unlabeled examples. In: Proceedings of the 5th Annual UK Workshop on Computational Intelligence (2005)
Google Scholar
Elkan, C., Noto, K.: Learning classifiers from only positive and unlabeled data. In: SIGKDD (2008)
Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
MATH Google Scholar
Kamishima, T., Hamasaki, M., Akaho, S.: Baggtaming – learning from wild and tame data. In: Proceedings of the Workshop on Web 2.0 Mining at ECML PKDD 2008 (2008)
Google Scholar
Morik, K., Brockhausen, P., Joachims, T.: Combining statistical learning with a knowledge-based approach - a case study in intensive care monitoring. In: ICML, pp. 268–277 (1999)
Google Scholar
Platt, J.C.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola, A., Bartlett, P., Scholkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers, pp. 61–74 (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Precision and Intelligence Laboratory, Tokyo Institute of Technology, 4259 Nagatsuta Midori-ku, Yokohama, 226-8503, Japan
Kritsada Sriphaew, Hiroya Takamura & Manabu Okumura

Authors

Kritsada Sriphaew
View author publications
You can also search for this author in PubMed Google Scholar
Hiroya Takamura
View author publications
You can also search for this author in PubMed Google Scholar
Manabu Okumura
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Sirindhorn International Institute of Technology, Thammasat University, 131 Moo 5 Tiwanont Road, 12000, Bangkadi, Muang, Pathumthani, Thailand
Thanaruk Theeramunkong
Dept. of Computer Engineering, Faculty of Engineering, Chulalongkorn University, 10330, Bangkok, Thailand
Boonserm Kijsirikul
Faculty of Science & Engineering, York University, 355 Lumbers Building, 4700 Keele Street, M3J 1P3, Toronto, Ontario, Canada
Nick Cercone
School of Knowledge Science, Japan Advanced Institute of Science and Technology, 1-1 Asahidai, Nomi, 923-1292, Ishikawa, Japan
Tu-Bao Ho

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sriphaew, K., Takamura, H., Okumura, M. (2009). Cool Blog Classification from Positive and Unlabeled Examples. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, TB. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2009. Lecture Notes in Computer Science(), vol 5476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01307-2_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-01307-2_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01306-5
Online ISBN: 978-3-642-01307-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics