GS4: Generating Synthetic Samples for Semi-Supervised Nearest Neighbor Classification

Moutafis, Panagiotis; Kakadiaris, Ioannis A.

doi:10.1007/978-3-319-13186-3_36

Panagiotis Moutafis¹¹ &
Ioannis A. Kakadiaris¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8643))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2227 Accesses

Abstract

In this paper, we propose a method to improve nearest neighbor classification accuracy under a semi-supervised setting. We call our approach GS4 (i.e., Generating Synthetic Samples Semi-Supervised). Existing self-training approaches classify unlabeled samples by exploiting local information. These samples are then incorporated into the training set of labeled data. However, errors are propagated and misclassifications at an early stage severely degrade the classification accuracy. To address this problem, the proposed method exploits the unlabeled data by using weights proportional to the classification confidence to generate synthetic samples. Specifically, our scheme is inspired by the Synthetic Minority Over-Sampling Technique. That is, each unlabeled sample is used to generate as many labeled samples as the number of classes represented by its $k$-nearest neighbors. In particular, the distance of each synthetic sample from its $k$-nearest neighbors of the same class is proportional to the classification confidence. As a result, the robustness to misclassification errors is increased and better accuracy is achieved. Experimental results using publicly available datasets demonstrate that statistically significant improvements are obtained when the proposed approach is employed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Natural-neighborhood based, label-specific undersampling for imbalanced, multi-label data

Article 30 March 2024

Semi-supervised learning method based on predefined evenly-distributed class centroids

Article 22 March 2020

DDSC-SMOTE: an imbalanced data oversampling algorithm based on data distribution and spectral clustering

Article 02 May 2024

References

Brown, M., Forsythe, A.: Robust tests for the equality of variances. J. Am. Stat. Assoc. 69(346), 364–367 (1974)
Article MATH Google Scholar
Chapelle, O., Schölkopf, B., Zien, A.: Semi-supervised Learning, vol. 2. MIT Press, Cambridge (2006)
Book Google Scholar
Chawla, N., Bowyer, K., Hall, L., Kegelmeyer, W.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
MATH Google Scholar
Cohen, I., Cozman, F., Sebe, N., Cirelo, M., Huang, T.: Semisupervised learning of classifiers: theory, algorithms, and their application to human-computer interaction. IEEE Trans. Pattern Anal. Mach. Intell. 26(12), 1553–1566 (2004)
Article Google Scholar
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theor. 13(1), 21–27 (1967)
Article MATH Google Scholar
Dean, N., Murphy, T., Downey, G.: Using unlabelled data to update classification rules with applications in food authenticity studies. J. Roy. Stat. Soc. Ser. C (Appl. Stat.) 55(1), 1–14 (2006)
Article MATH MathSciNet Google Scholar
Ghosh, A.: A probabilistic approach for semi-supervised nearest neighbor classification. Pattern Recogn. Lett. 33(9), 1127–1133 (2012)
Article Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning Data Mining, Inference and Prediction. Springer, New York (2009)
MATH Google Scholar
Merz, C., Murphy, P., Aha, D.: UCI repository of machine learning databases. Department of Information and Computer Science, University of California (2012)
Google Scholar
Wolfe, D., Hollander, M.: Nonparametric Statistical Methods. Wiley Series in Probability and Statistics. Wiley, New York (1973)
MATH Google Scholar
Zhou, D., Bousquet, O., Lal, T., Weston, J., Schölkopf, B.: Learning with local and global consistency. Adv. Neural Inf. Process. Syst. 16(16), 321–328 (2004)
Google Scholar
Zhu, X., Ghahramani, Z.: Learning from labeled and unlabeled data with label propagation. Technical report, CMU-CALD-02-107, Carnegie Mellon University (2002)
Google Scholar
Zhu, X., Goldberg, A.: Introduction to semi-supervised learning. Synth. Lect. Artif. Intell. Mach. Learn. 3(1), 1–130 (2009)
Article Google Scholar

Download references

Acknowledgments

This research was funded in part by the US Army Research Lab (W911NF-13-1-0127) and the UH Hugh Roy and Lillie Cranz Cullen Endowment Fund. All statements of fact, opinion or conclusions contained herein are those of the authors and should not be construed as representing the official views or policies of the sponsors.

Author information

Authors and Affiliations

Computational Biomedicine Lab, Department of Computer Science, University of Houston, Houston, TX, 77004-2693, USA
Panagiotis Moutafis & Ioannis A. Kakadiaris

Authors

Panagiotis Moutafis
View author publications
You can also search for this author in PubMed Google Scholar
Ioannis A. Kakadiaris
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Panagiotis Moutafis .

Editor information

Editors and Affiliations

National Chiao Tung University, Hsinchu, Taiwan
Wen-Chih Peng
Google Research, Mountain View, California, USA
Haixun Wang
University of Melbourne, Melbourne, Victoria, Australia
James Bailey
National Cheng Kung University, Tainan, Taiwan
Vincent S. Tseng
Japan Advanced Institute of Science and Technology, Nomi City, Japan
Tu Bao Ho
Nanjing University, Nanjing, China
Zhi-Hua Zhou
National Chengchi University, Taipei, Taiwan
Arbee L.P. Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Moutafis, P., Kakadiaris, I.A. (2014). GS4: Generating Synthetic Samples for Semi-Supervised Nearest Neighbor Classification. In: Peng, WC., et al. Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2014. Lecture Notes in Computer Science(), vol 8643. Springer, Cham. https://doi.org/10.1007/978-3-319-13186-3_36

Download citation

DOI: https://doi.org/10.1007/978-3-319-13186-3_36
Published: 26 November 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13185-6
Online ISBN: 978-3-319-13186-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

GS4: Generating Synthetic Samples for Semi-Supervised Nearest Neighbor Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Natural-neighborhood based, label-specific undersampling for imbalanced, multi-label data

Semi-supervised learning method based on predefined evenly-distributed class centroids

DDSC-SMOTE: an imbalanced data oversampling algorithm based on data distribution and spectral clustering

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

GS4: Generating Synthetic Samples for Semi-Supervised Nearest Neighbor Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Natural-neighborhood based, label-specific undersampling for imbalanced, multi-label data

Semi-supervised learning method based on predefined evenly-distributed class centroids

DDSC-SMOTE: an imbalanced data oversampling algorithm based on data distribution and spectral clustering

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation