Knowledge Supervised Text Classification with No Labeled Documents

Zhang, Congle; Xue, Gui-Rong; Yu, Yong

doi:10.1007/978-3-540-89197-0_47

Knowledge Supervised Text Classification with No Labeled Documents

Congle Zhang^3,4,
Gui-Rong Xue^3,4 &
Yong Yu

Conference paper

1339 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5351))

Abstract

In traditional text classification approaches, the semantic meanings of the classes are described by the labeled documents. Since labeling documents is often time consuming and expensive, it is a promising idea that asking users to provide some keywords to depict the classes, instead of labeling any documents. However, short pieces of keywords may not contain enough information and therefore may lead to unreliable classifier. Fortunately, there are large amount of public data easily available in web directories, such as ODP, Wikipedia, etc. We are interested in exploring the enormous crowd intelligence contained in such public data to enhance text classification. In this paper, we propose a novel text classification framework called “Knowledge Supervised Learning”(KSL), which utilizes the knowledge in keywords and the crowd intelligence to learn the classifier without any labeled documents. We design a two-stage risk minimization (TSRM) approach for the KSL problem. It can optimize the expected prediction risk and build the high quality classifier. Empirical results verify our claim: our algorithm can achieve above 0.9 on Micro-F1 on average, which is much better than baselines and even comparable against SVM classifier supervised by labeled documents.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Dayanik, A., Lewis, D.: Constructing informative prior distributions from domain knowledge in text classification. In: SIGIR 2006, pp. 493–500 (1995)
Google Scholar
Genkin, A., Lewis, D., Madigan, D.: Large-scale bayesian logistic regression for text categorization. Technical report, DIMACS (2004)
Google Scholar
Liu, B., Li, X., Lee, W.S.: Text Classification by Labeling Words. In: AAAI 2004, pp. 425–430 (2004)
Google Scholar
Chelba, C., Acero, A.: Adaptation of maximum entropy capitalizer: Little data can help a lot. In: EMNLP 2004 (2004)
Google Scholar
Lewis, D., Gale, W.: A sequential algorithm for training text classifiers. In: SIGIR 1994 (1994)
Google Scholar
Madigan, D., Gavrin, J., Raftery, A.: Eliciting prior information to enhance the predictive performance of bayesian graphical models. Communications in Statistics-Theory and Methods, pp. 2271–2292 (1995)
Google Scholar
Gabrilovich, E., Markovitch, S.: Feature Generation for Text Categorization Using World Knowledge. In: IJCAI 2005 (2005)
Google Scholar
Gabrilovich, E., Markovitch, S.: Overcoming the Brittleness Bottleneck using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge. In: AAAI 2006 (2006)
Google Scholar
Ifrim, G., Weikum, G.: Transductive Learning for Text Classification Using Explicit Knowledge Models. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 223–234. Springer, Heidelberg (2006)
Chapter Google Scholar
Raghavan, H., Madani, O., Jones, R.: Interactive feature selection. In: IJCAI 2005, pp. 841–846 (2005)
Google Scholar
Lafferty, J., Zhai, C.: Document language models, query models, and risk minimization for information retrieval. In: Proceedings of SIGIR 2001 (2001)
Google Scholar
Nigam, K., Ghani, R.: Analyzing the Effectiveness and Applicability of Co-training. In: CIKM 2000, pp. 86–93 (2000)
Google Scholar
Jones, R., McCallum, A., Nigam, K., Riloff, E.: Bootstrapping for text learning tasks. In: IJCAI 1999 Workshop on Text Mining (1999)
Google Scholar
Raina, R., Ng, A.Y., Koller, D.: Constructing informative priors using transfer learning. In: ICML 2006, pp. 713–720 (2006)
Google Scholar
Schapire, R., Rochery, M., Rahim, M., Gupta, N.: Incorporating prior knowledge into boosting. In: ICML 2002 (2002)
Google Scholar
Hofmann, T., Puzicha, J.: Statistical Models for Co-occurrence Data. Technical Report 1999 (1999)
Google Scholar
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Chapter Google Scholar
Joachims, T.: Transductive Inference for Text Classification using Support Vector Machines. In: International Conference on Machine Learning, ICML 1999 (1999)
Google Scholar
T. Joachims, Transductive Learning via Spectral Graph Partitioning. In: Proceedings of the International Conference on Machine Learning (ICML) (2003)
Google Scholar
Mitchell, T.: The role of unlabeled data in supervised learning. In: Proceedings of the Sixth International Colloquium on Cognitive Science (1999)
Google Scholar
Ji, X., Xu, W.: Document clustering with prior knowledge. In: SIGIR 2006, pp. 405–412 (2006)
Google Scholar
Wu, X., Srihari, R.: Incorporating prior knowledge with weighted margin support vector machines. In: KDD 2004, pp. 326–333 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Apex Lab, Shanghai Jiaotong University, Shanghai, 200240, China
Congle Zhang & Gui-Rong Xue
State Key Lab of CAD & CG, Zhejiang University, Hangzhou, 310058, China
Congle Zhang & Gui-Rong Xue

Authors

Congle Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Gui-Rong Xue
View author publications
You can also search for this author in PubMed Google Scholar
Yong Yu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Japan Advanced Institute of Science and Technology, Asahidai 1-1, 923-12292, Nomi, Japan
Tu-Bao Ho
Department of Computer Science & Technology, Nanjing University, 22 Hankou Road, 210093, China
Zhi-Hua Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, C., Xue, GR., Yu, Y. (2008). Knowledge Supervised Text Classification with No Labeled Documents. In: Ho, TB., Zhou, ZH. (eds) PRICAI 2008: Trends in Artificial Intelligence. PRICAI 2008. Lecture Notes in Computer Science(), vol 5351. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89197-0_47

Download citation

DOI: https://doi.org/10.1007/978-3-540-89197-0_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89196-3
Online ISBN: 978-3-540-89197-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics