Best-Match Method Used in Co-training Algorithm

Wang, Hui; Ji, Liping; Zuo, Wanli

doi:10.1007/978-3-540-77018-3_40

Hui Wang¹,
Liping Ji¹ &
Wanli Zuo¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4819))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

1496 Accesses

Abstract

Since 1998 there has been significant interest in supervised learning algorithms that combine labeled and unlabeled data for text learning tasks. The co-training algorithm applied to datasets which have a natural separation of their features into two disjoint sets. In this paper, we demonstrate that when learning from labeled and unlabeled data using co-training algorithm, selecting those document examples first which have two parts of best matching features can obtain a good performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39(2/3), 103–134 (2000)
Article MATH Google Scholar
Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. In: Advances in NIPS 11 (1999)
Google Scholar
Joachims, T.: Transductive inference for text classification using support vector machines. In: Proceedings of ICML 1999 (1999)
Google Scholar
Blum, A., Mitchell, T.M.: Combining labeled and unlabeled data with co-training. In: COLT 1998, Madison, WI, USA,
Google Scholar
Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. In: 9th International Conference on Information and Knowledge Management (CIKM), 2000. Computational Learning Theory, pp. 92–100 (1998), www.cs.cmu.edu/knigam
Lewis, D.D.: Naive (Bayes) at forty: The independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) Machine Learning: ECML-98. LNCS, vol. 1398, Springer, Heidelberg (1998)
Chapter Google Scholar
McCallum, A., Nigam, K.: A comparison of event models for naive Bayes text classification. In: AAAI-98 Workshop on Learning for Text Categorization, Tech. rep. WS-98-05, AAAI Press (1998), http://www.cs.cmu.edu/mccallum
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Article Google Scholar
Yang, Y., Liu, X.: A re-examination of Text Categorization Methods. In: SIGIR-99
Google Scholar
Liu, B., Dai, Y., Li, X., Lee, W.S., Yu, P.S.: Building text classifiers using positive and unlabeled examples. In: Proc. of the 3rd IEEE Int’l Conf. on Data Mining. Melbourne (ICDM-03), pp. 179–188. IEEE Computer Society, Los Alamitos (2003)
Google Scholar
Denis, F.: PAC learning from positive statistical queries. In: Proc. 9th International Conference on Algorithmic Learning Theory-ALT 987, pp. 112–126 (1998)
Google Scholar
Liu, B., Lee, W.S., Yu, P., Li, X.: Partially supervised classification of text documents. In: ICML-02
Google Scholar
Shih, L.K., Karger, D.R.: Using URLs and table layout for Web classification tasks. In: Feldman, S., Uretsky, M., Najork, M., Wills, C.E. (eds.) Proc. of the 13th Int’l Conf. on the World Wide Web (WWW-2004), pp. 193–202. ACM Press, New York (2004)
Chapter Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 39(1), 1–38 (1977)
MATH MathSciNet Google Scholar
Brualdi, R.A.: Introductory Combinatorics, 3rd edn., pp. 200–300. Prentice Hall Inc, Englewood Cliffs (1999)
MATH Google Scholar
Keerthi, S.S., Shevade, S.K., Bhattacharyya, C., Murthy, K.R.K.: Improvements to Platt’s SMO Algorithm for SVM Classifier Design. Neural Computation 13(3), 637–649
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science and Technology, Jilin University, Key Laboratory of Symbolic Computation and Knowledge, Engineering of the Ministry of Education, Changchun 130012, China
Hui Wang, Liping Ji & Wanli Zuo

Authors

Hui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Liping Ji
View author publications
You can also search for this author in PubMed Google Scholar
Wanli Zuo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Takashi Washio Zhi-Hua Zhou Joshua Zhexue Huang Xiaohua Hu Jinyan Li Chao Xie Jieyue He Deqing Zou Kuan-Ching Li Mário M. Freire

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, H., Ji, L., Zuo, W. (2007). Best-Match Method Used in Co-training Algorithm. In: Washio, T., et al. Emerging Technologies in Knowledge Discovery and Data Mining. PAKDD 2007. Lecture Notes in Computer Science(), vol 4819. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77018-3_40

Download citation

DOI: https://doi.org/10.1007/978-3-540-77018-3_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77016-9
Online ISBN: 978-3-540-77018-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics