Abstract
We introduce a novel classification algorithm, called the membership score machine (MSM), for highly nonlinear classification for small data, which is particularly applicable for highly irregular/non-globular dataset of arbitrarily many classes. Given a training dataset, the method utilizes within-class clustering and dimensionality reduction to extract useful geometric features, for each of classes. For data points in a class, the method first performs clustering and then applies the principal component analysis (PCA) for each cluster to form a reliable geometric representation in lower dimensions. The goal in the training stage is to draw out reliable geometric features and related anisotropic measures, one for each cluster. At the prediction stage, it calculates the membership scores based on the anisotropic measures with respect to each of the clusters; a test data point is classified for the class which contains the cluster that makes the maximum membership score. The proposed algorithm, the MSM, turns out to be scalable and more effective than existing algorithms in accuracy, especially for small and highly irregular datasets. The main idea behind the MSM is to represent the dataset geometrically by expressing it as a combination of multiple easy-to-classify clusters transformed into principal components in low dimensions. Numerical experiments are presented and compared with existing popular classifiers, to demonstrate its superior performances.
Similar content being viewed by others
References
Guillaumin M, Verbeek J, Schmid C (2009) Is that you? metric learning approaches for face identification. In: 2009 IEEE 12th international conference on computer vision, IEEE, pp 498–505
Samaria FS, Harter AC (1994) Parameterisation of a stochastic model for human face identification. In: Proceedings of 1994 IEEE workshop on applications of computer vision, IEEE, pp 138–142
Lanitis A, Taylor CJ, Cootes TF (1995) Automatic face identification system using flexible appearance models. Image Vis Comput 13(5):393–401
Wang T, Wu DJ, Coates A, Ng AY (2012) End-to-end text recognition with convolutional neural networks. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012), IEEE, pp 3304–3308
Shi B, Bai X, Yao C (2016) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304
Juang B-H, Hou W, Lee C-H (1997) Minimum classification error rate methods for speech recognition. IEEE Trans Speech Audio Process 5(3):257–265
El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recogn 44(3):572–587
Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016) Hierarchical attention networks for document classification. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 1480–1489
Manevitz LM, Yousef M (2001) One-class svms for document classification. J Mach Learn Res 2(Dec):139–154
Minsky M (1961) Steps toward artificial intelligence. Proc IRE 49(1):8–30
Maron ME (1961) Automatic indexing: an experimental inquiry. Journal of the ACM (JACM) 8(3):404–417
Cox DR (1958) The regression analysis of binary sequences. J R Stat Soc Series B (Methodological) 20(2):215–232
Schölkopf B, Smola AJ, Bach F, et al (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond, MIT press
Fix E, Hodges JL (1989) Discriminatory analysis. nonparametric discrimination: Consistency properties. International Statistical Review/Revue Internationale de Statistique 57(3):238–247
Ho T K (1995) Random decision forests. In: Proceedings of 3rd international conference on document analysis and recognition, vol 1, IEEE, pp 278–282
LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551
Schölkopf B, Smola A, Müller K-R (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680
Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv:1312.6114
Kingma DP, Dhariwal P (2018) Glow: Generative flow with invertible 1x1 convolutions. In: Advances in neural information processing systems, pp 10215–10224
Oord A , Kalchbrenner N, Kavukcuoglu K (2016) Pixel recurrent neural networks. arXiv:1601.06759
Jayadeva, Khemchandani R, Chandra S (2007) Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29(5):905–910
Shao Y-H, Zhang C-H, Wang X-B, Deng N-Y (2011) Improvements on twin support vector machines. IEEE Trans Neural Netw 22(6):962–968
Peng X (2010) Tsvr: an efficient twin support vector machine for regression. Neural Netw 23 (3):365–372
Qi Z, Tian Y, Shi Y (2013) Robust twin support vector machine for pattern classification. Pattern Recogn 46(1):305–316
Gu Q, Han J (2013) Clustered support vector machines. In: Artificial Intelligence and Statistics, PMLR, pp 307–315
Hsieh C-J, Si S, Dhillon I (2014) A divide-and-conquer solver for kernel support vector machines. In: International conference on machine learning, PMLR, pp 566–574
Guo G, Wang H, Bell D, Bi Y, Greer K (2003) Knn model-based approach in classification. In: OTM Confederated International Conferences” On the Move to Meaningful Internet Systems”, Springer, pp 986–996
Zhang S, Li X, Zong M, Zhu X, Wang R (2017) Efficient knn classification with different numbers of nearest neighbors. IEEE Trans Neural Netw Learn Syst 29(5):1774–1785
Yong Z, Youwen L, Shixiong X (2009) An improved knn text classification algorithm based on clustering. J Comput 4(3):230–237
MacQueen J, et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1, Oakland, CA, USA, pp 281–297
Lloyd S (1982) Least squares quantization in pcm. IEEE Trans Inf Theory 28(2):129–137
Davis JV, Kulis B, Jain P, Sra S, Dhillon I S (2007) Information-theoretic metric learning. In: Proceedings of the 24th international conference on Machine learning, pp 209–216
Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. The Am Stat 46(3):175–185
Baraldi A, Blonda P (1999) A survey of fuzzy clustering algorithms for pattern recognition. i. IEEE Trans Syst Man Cybern, Part B (Cybernetics) 29(6):778–785
Pearson K (1901) On lines and planes of closest fit to systems of points in space. Phil Mag 2(11):559–572
Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24:417–441 and 498–520
Hotelling H (1936) Relations between two sets of variates. Biometrika 28(3/4):321–377
Thorndike RL (1953) Who belongs in the family. In: Psychometrika, Citeseer
Ketchen DJ, Shook CL (1996) The application of cluster analysis in strategic management research: an analysis and critique. Strategic management journal 17(6):441–458
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Dua D, Graff C (2017) UCI machine learning repository, University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml
Don DR, Iacob IE (2020) Dcsvm: fast multi-class classification using support vector machines. Int J Mach Learn Cybern 11(2):433–447
Tanveer M, Gautam C, Suganthan PN (2019) Comprehensive evaluation of twin svm based classifiers on uci datasets. Appl Soft Comput 83:105617
Acknowledgements
The work of Byungjoon Lee is partially supported by The Catholic University of Korea, Research Fund, 2020 and the National Research Foundation (NRF) of Korea under Grant NRF-2020R1A2C4002378.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lee, B., Kim, H. & Kim, S. Membership score machine for highly nonlinear classification for small data. Appl Intell 53, 6511–6524 (2023). https://doi.org/10.1007/s10489-022-03652-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03652-8