Abstract
In this paper, we propose a method to improve nearest neighbor classification accuracy under a semi-supervised setting. We call our approach GS4 (i.e., Generating Synthetic Samples Semi-Supervised). Existing self-training approaches classify unlabeled samples by exploiting local information. These samples are then incorporated into the training set of labeled data. However, errors are propagated and misclassifications at an early stage severely degrade the classification accuracy. To address this problem, the proposed method exploits the unlabeled data by using weights proportional to the classification confidence to generate synthetic samples. Specifically, our scheme is inspired by the Synthetic Minority Over-Sampling Technique. That is, each unlabeled sample is used to generate as many labeled samples as the number of classes represented by its \(k\)-nearest neighbors. In particular, the distance of each synthetic sample from its \(k\)-nearest neighbors of the same class is proportional to the classification confidence. As a result, the robustness to misclassification errors is increased and better accuracy is achieved. Experimental results using publicly available datasets demonstrate that statistically significant improvements are obtained when the proposed approach is employed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Brown, M., Forsythe, A.: Robust tests for the equality of variances. J. Am. Stat. Assoc. 69(346), 364–367 (1974)
Chapelle, O., Schölkopf, B., Zien, A.: Semi-supervised Learning, vol. 2. MIT Press, Cambridge (2006)
Chawla, N., Bowyer, K., Hall, L., Kegelmeyer, W.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Cohen, I., Cozman, F., Sebe, N., Cirelo, M., Huang, T.: Semisupervised learning of classifiers: theory, algorithms, and their application to human-computer interaction. IEEE Trans. Pattern Anal. Mach. Intell. 26(12), 1553–1566 (2004)
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theor. 13(1), 21–27 (1967)
Dean, N., Murphy, T., Downey, G.: Using unlabelled data to update classification rules with applications in food authenticity studies. J. Roy. Stat. Soc. Ser. C (Appl. Stat.) 55(1), 1–14 (2006)
Ghosh, A.: A probabilistic approach for semi-supervised nearest neighbor classification. Pattern Recogn. Lett. 33(9), 1127–1133 (2012)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning Data Mining, Inference and Prediction. Springer, New York (2009)
Merz, C., Murphy, P., Aha, D.: UCI repository of machine learning databases. Department of Information and Computer Science, University of California (2012)
Wolfe, D., Hollander, M.: Nonparametric Statistical Methods. Wiley Series in Probability and Statistics. Wiley, New York (1973)
Zhou, D., Bousquet, O., Lal, T., Weston, J., Schölkopf, B.: Learning with local and global consistency. Adv. Neural Inf. Process. Syst. 16(16), 321–328 (2004)
Zhu, X., Ghahramani, Z.: Learning from labeled and unlabeled data with label propagation. Technical report, CMU-CALD-02-107, Carnegie Mellon University (2002)
Zhu, X., Goldberg, A.: Introduction to semi-supervised learning. Synth. Lect. Artif. Intell. Mach. Learn. 3(1), 1–130 (2009)
Acknowledgments
This research was funded in part by the US Army Research Lab (W911NF-13-1-0127) and the UH Hugh Roy and Lillie Cranz Cullen Endowment Fund. All statements of fact, opinion or conclusions contained herein are those of the authors and should not be construed as representing the official views or policies of the sponsors.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Moutafis, P., Kakadiaris, I.A. (2014). GS4: Generating Synthetic Samples for Semi-Supervised Nearest Neighbor Classification. In: Peng, WC., et al. Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2014. Lecture Notes in Computer Science(), vol 8643. Springer, Cham. https://doi.org/10.1007/978-3-319-13186-3_36
Download citation
DOI: https://doi.org/10.1007/978-3-319-13186-3_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13185-6
Online ISBN: 978-3-319-13186-3
eBook Packages: Computer ScienceComputer Science (R0)