Abstract
In practical application, the data are imbalanced, it is difficult to find the balanced, rather skewed data is the common occurrence. This poses a severe challenge to the classification algorithm. At present, imbalanced data classification methods are mainly for binary classes designed, and it is difficult to extend them to multiple classes. In this study, we introduced Lie group machine learning and proposed a semi-supervised learning algorithm based on the linear Lie group. First, the sample set is represented by a matrix, the isomorphism(or homomorphism)-GL(n) linear Lie group of the corresponding learning system is found, and the labeled data are used to represent the object to be learned by linear Lie group. Then, according to the algebraic structure of the linear Lie group, it is marked by the group method. We performed experiments on 18 benchmark multi-class imbalanced datasets to demonstrate the performance of our proposed method and measured the performance of multi-class imbalanced data using four state-of-the-art learning algorithms (mean of accuracy, mean of f-measure, and mean of area under the curve). The experimental results demonstrate that the proposed method is effective and improves the performance.
Similar content being viewed by others
References
Ao X et al (2014) Combining supervised and unsupervised models via unconstrained probabilistic embedding. Inf Sci 257:101–114
Basu S, Banerjee A, Mooney R (2002) “Semi-supervised clustering by seeding.” In: Proceedings of 19th international conference on machine learning (ICML-2002)
Basu S, Bilenko M, Mooney RJ (2003) “Comparing and unifying search-based and similarity-based approaches to semi-supervised clustering.” Proceedings of the ICML-2003 workshop on the continuum from labeled to unlabeled data in machine learning and data mining
Bennett KP, Demiriz A (1999) “Semi-supervised support vector machines.” Advances in Neural Information processing system
Bi J, Zhang C (2018) An empirical comparison on state-of-the-art multi-class imbalance learning algorithms and a new diversified ensemble learning scheme. Knowl Based Syst 158:81–93
Cai W, Chen S, Zhang D (2009) A simultaneous learning framework for clustering and classification. Pattern Recognit 42(7):1248–1259
Chawla NV, Karakoulas G (2005) Learning from labeled and unlabeled data: an empirical study across techniques and domains. J Artif Intell Res 23:331–366
Olivier C, Vikas S, Sathiya SK (2008) Optimization techniques for semi-supervised support vector machines. J Mach Learn Res 9:203–233
Chawla NV et al (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Coates A, Ng A, Lee H (2011) “An analysis of single-layer networks in unsupervised feature learning.” In: Proceedings of the fourteenth international conference on artificial intelligence and statistics
Zhang C, Cheng J, Tian Q (2019) Unsupervised and semi-supervised image classification with weak semantic consistency. IEEE Trans Multimed 21(10):2482–2491
Yu D, Varadarajan B, Deng L, Acero A (2010) Active learning and semi-supervised learning for speech recognition: a unified framework using the global entropy reduction maximization criterion. Comput Speech Lang 24(3):433–444
Fernández-Navarro F, Hervás-Martínez C, Gutiérrez PA (2011) A dynamic over-sampling procedure based on sensitivity for multi-class problems. Pattern Recogn 44(8):1821–1833
Forestier G, Wemmert C (2016) Semi-supervised learning using multiple clusterings with limited labeled data. Inf Sci 361:48–65
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701
Roli F, Marcialis GL (2006) Semi-supervised PCA-based face recognition using self-training. Pro Joint IAPR Int Workshops Struct Syntactic Stat Pattern Recognit 4109:560–568
García S et al (2018) Dynamic ensemble selection for multi-class imbalanced datasets. Inf Sci 445:22–3
Ghanem AS, Venkatesh S, West G (2010) “Multi-class pattern classification in imbalanced data.” In: 2010 20th international conference on pattern recognition. IEEE
Grira N, Crucianu M, Boujemaa N (2008) Active semi-supervised fuzzy clustering. Pattern Recogn 41(5):1834–1844
Haixiang G et al (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239
Hand DJ, Till RJ (2001) A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach Learn 45(2):171–186
Hart PE (1973) Pattern classification and scene analysis, vol 3. Wiley, New York
Gan H, Sang N, Huang R (2014) Self-training-based face recognition using semi-supervised linear discriminant analysis and affinity propagation. J Opt Soc Am A Opt Image Sci 31(1):1–6
Holm S (1979) “A simple sequentially rejective multiple test procedure.” Scandinavian journal of statistics. pp 65–70
Li K et al (2009) “A novel semi-supervised fuzzy c-means clustering method.” 2009 Chinese Control and Decision Conference. IEEE
Joachims T (1999) “Transductive inference for text classification using support vector machines.” Icml. Vol. 99
Liu X-Y, Li Q-Q, Zhou Z-H (2013) “Learning imbalanced multi-class data with optimal dichotomy weights.” In: 2013 IEEE 13th international conference on data mining. IEEE
Mai DS, Ngo LT (2015) “Semi-supervised fuzzy C-means clustering for change detection from multispectral satellite image.” In: 2015 IEEE international conference on fuzzy systems (FUZZ-IEEE). IEEE
Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7:2399–2434
Melacci S, Belkin M (2011) “Laplacian support vector machines trained in the primal.” J Mach Learn Res 12:1149-1184
Ng AY, Jordan MI (2002) “On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes.” Advances in neural information processing systems
Chapelle O, Schlkopf B, Zien A (2013) Semi-supervised learning in handbook on neural information processing. Springer, Berlin
Mallapragada PK, Jin R, Jain AK, Liu Y (2009) SemiBoost: boosting for semi-supervised learning. IEEE Trans Pattern Anal Mach Intell 31:2000–2014
Qi Z, Tian Y, Shi Y (2012) Laplacian twin support vector machine for semi-supervised classification. Neural Netw 35:46–53
Razakarivony S, Jurie F (2016) Vehicle detection in aerial imagery: a small target detection benchmark. J Vis Commun Image Represent 34:187–203
Riaz S, Arshad A, Jiao L (2018) Fuzzy rough C-mean based unsupervised CNN clustering for large-scale image data. Appl Sci 8(10):1869
Riaz Saman, Arshad Ali, Jiao Licheng (2019) “Rough-KNN noise-filtered convolutional neural network for image classification.” In: Proceedings 3rd international conference information technology intelligence transportation system (ITITS). Vol. 314
Riaz S, Arshad A, Jiao L (2018) Rough noise-filtered easy ensemble for software fault prediction. IEEE Access 6:46886–46899
Sáez JA, Krawczyk B, Woźniak M (2016) Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets. Pattern Recogn 57:164–178
Van Vaerenbergh S, Santamaria I, Barbano PE (2011) Semi-supervised handwritten digit recognition using very few labeled data. In: Proceedings of the IEEE international conference acoustics speech signal process 7882:2136–2139
Tang Feng et al (2007) “Co-tracking using semi-supervised support vector machines.” In: 2007 IEEE 11th international conference on computer vision. IEEE
UCI Repository of Machine Learning Databases Aug (2018) [online] Available: http://www.ics.uci.edu/mlearn/MLRepository.html
Vluymans S et al (2016) Fuzzy rough classifiers for class imbalanced multi-instance data. Pattern Recogn 53:36–45
Vluymans S et al (2018) Dynamic affinity-based classification of multi-class imbalanced data with one-versus-one decomposition: a fuzzy rough set approach. Knowl Inf Syst 56(1):55–84
Wang S, Yao X (2012) Multiclass imbalance problems: analysis and potential solutions. IEEE Trans Syst Man Cybern Part B 42(4):1119–1130
Wang S, Chen H, Yao X (2010) “Negative correlation learning for classification ensembles.” In: The 2010 international joint conference on neural networks (IJCNN). IEEE
Wilcoxon F (1992) Individual comparisons by ranking methods. Breakthroughs in statistics. Springer, New York, pp 196–202
Liu X-Y, Wu J, Zhou Z-H (2006) “Exploratory undersampling for class-imbalance learning.” In; Proceedings of the international conference data mining (ICDM), pp. 539-550
Zhu X (2008) “Semi-supervised learning literature survey”
Cao Y, He H, Huang H (2011) LIFT: A new framework of learning from testing data for face recognition. Neurocomputing 74(6):916–929
Kong Y, Ni D (2020) A semi-supervised and incremental modeling framework for wafer map classification. IEEE Trans Semicond Manuf 33(1):62–71
YU J, RUI Y, TAO D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process 23(5):2019-–2032
YU J, TAO D, WANG M (2012) Adaptive hypergraph learning and its application in image classification. IEEE Trans Image Process 21(7):3262–3272
YU J et al (2014) Learning to rank using user clicks and visual features for image retrieval. IEEE Trans Cybern 45(4):767–779
Yu J et al (2019) “Hierarchical deep click feature prediction for fine-grained image recognition.” IEEE transactions on pattern analysis and machine intelligence
Yu H et al (2013) “Recognition of multiple imbalanced cancer types based on DNA microarray data using ensemble classifiers.” BioMed research international 2013
Zhao X-M et al (2008) Protein classification with imbalanced data. Proteins Struct Funct Bioinf 70(4):1125–1132
Acknowledgements
The authors would like to thank two anonymous reviewers for carefully reviewing this letter and giving valuable comments to improve this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Xu, C., Zhu, G. Semi-supervised Learning Algorithm Based on Linear Lie Group for Imbalanced Multi-class Classification. Neural Process Lett 52, 869–889 (2020). https://doi.org/10.1007/s11063-020-10287-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-020-10287-8