Comparison of Adjusted Methods for Selecting Useful Unlabeled Data for Semi-Supervised Learning Algorithms

Le, Thanh-Binh; Kim, Sang-Woon

doi:10.1007/978-3-319-19066-2_51

Thanh-Binh Le⁹ &
Sang-Woon Kim⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9101))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

2687 Accesses

Abstract

This paper presents a comparison of the methods of selecting a small amount useful unlabeled data to improve the classification accuracy of semi-supervised learning (SSL) algorithms. In particular, three selection approaches, namely, the simply adjusted approach based on an uncertainty level, the normalized-and-adjusted approach, and the entropy based adjusted approach, are considered and compared empirically. The experimental results, which are obtained from synthetic and real-life benchmark data using semi-supervised support vector machines (S3VMs), demonstrate that the entropy based approach works slightly better than the other ones in terms of the classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bache, K., Lichman, M.: UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences, CA (2013)
Google Scholar
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proc. of the 11th Ann. Conf. Computational Learning Theory (COLT 98), Madison, WI, pp. 92–100 (1998)
Google Scholar
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intelligent Systems and Technology 2(3), 1–27 (2011). http://www.csie.ntu.edu.tw/\(\sim \)cjlin/libsvm
Article Google Scholar
Chapelle, O., Schölkopf, B., Zien, A.: Semi-Supervised Learning. The MIT Press, MA (2006)
Book Google Scholar
d’Alché-Buc, F., Grandvalet, Y., Ambroise, C.: Semi-supervised marginboost. Advances in Neural Information Processing Systems (NIPS), pp. 553–560. The MIT Press, London (2002)
Google Scholar
Dagan, I., Engelson, S. P.: Committee-based sampling for training probabilistic classifiers. In: Proc. of the 12th Int’l Conf. on Machine Learning (ICML 1995), pp. 150–157. Morgan Kaufmann, Tahoe City, CA (1995)
Google Scholar
Le, T.-B., Kim, S.-W.: On incrementally using a small portion of strong unlabeled data for semi-supervised learning algorithms. Pattern Recognition Letters 41, 53–64 (2014)
Article Google Scholar
Le, T. -B., Kim, S. -W.: On selecting helpful unlabeled data for improving semi-supervised support vector machines. In: Proc. of the 3rd Int’l Conf. on Pattern Recognition Applications and Methods (ICPRAM 2014), Angers, France, pp. 48–59 (2014)
Google Scholar
Mallapragada, P.K., Jain, A.K., Liu, Y.: SemiBoost: boosting for semi-supervised learning. IEEE Trans. Pattern Anal. and Machine Intell. 31(11), 2000–2014 (2009)
Article Google Scholar
Platt, J.C.: Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In: Smola, A., Bartlett, P., Schölkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers. The MIT Press, Cambridge (2000)
Google Scholar
Reitmaier, T., Sick, B.: Let us know your decision: Pool-based active training of a generative classifier with the selection strategy 4DS. Information Sciences 230, 106–131 (2013)
Article Google Scholar
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proc. of the 33rd annual meeting on Association for Computational Linguistics (ACL1995), Cambridge, MA, 189–196 (1995)
Google Scholar
Zhu, X.: Semi-Supervised Learning Literature Survey. Technical Report 1530, Dept. of Computer Sciences, University of Wisconsin at Madison, MA (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Myongji University, Yongin, 449-728, Korea
Thanh-Binh Le & Sang-Woon Kim

Authors

Thanh-Binh Le
View author publications
You can also search for this author in PubMed Google Scholar
Sang-Woon Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sang-Woon Kim .

Editor information

Editors and Affiliations

Texas State University, San Marcos, Texas, USA
Moonis Ali
Dongguk University, Seoul, Korea, Republic of (South Korea)
Young Sig Kwon
Dongguk University, Seoul, Korea, Republic of (South Korea)
Chang-Hwan Lee
Dongguk University, Seoul, Korea, Republic of (South Korea)
Juntae Kim
Seoul National University, Seoul, Korea, Republic of (South Korea)
Yongdai Kim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Le, TB., Kim, SW. (2015). Comparison of Adjusted Methods for Selecting Useful Unlabeled Data for Semi-Supervised Learning Algorithms. In: Ali, M., Kwon, Y., Lee, CH., Kim, J., Kim, Y. (eds) Current Approaches in Applied Artificial Intelligence. IEA/AIE 2015. Lecture Notes in Computer Science(), vol 9101. Springer, Cham. https://doi.org/10.1007/978-3-319-19066-2_51

Download citation

DOI: https://doi.org/10.1007/978-3-319-19066-2_51
Published: 01 May 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19065-5
Online ISBN: 978-3-319-19066-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics