A Novel Method to Create Synthetic Samples with Autoencoder Multi-layer Extreme Learning Machine

He, Yulin; Huang, Qihang; Xu, Shengsheng; Huang, Joshua Zhexue

doi:10.1007/978-3-031-11217-1_2

Yulin He^10,11,
Qihang Huang¹¹,
Shengsheng Xu¹¹ &
…
Joshua Zhexue Huang^10,11

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13248))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1330 Accesses

Abstract

The imbalanced classification is an important branch of supervised learning and plays the important roles in many application fields. Compared with the sophisticated improvements on classification algorithms, it is easier to obtain the good performance by synthesizing the minority class samples so that the classification algorithms can be trained based on the balanced data sets. In consideration of the strong representation ability of multi-layer extreme learning machine (MLELM), this paper proposes a new method to create the synthetic minority class samples based on auto-encoder ML-ELM (simplified as AE-MLELM-SynMin). Firstly, an AE-MLELM is trained to obtain the deep feature encodings of original minority class samples. Secondly, the crossover and mutation operations are preformed on the original deep feature encodings and a number of new deep feature encodings are generated. Thirdly, the synthetic minority class samples are created by transforming the new deep feature encodings with AE-MLELM. Finally, the persuasive experiments are conducted to demonstrate the effectiveness of AE-MLELM-SynMin method. The experimental results show that our method can obtain the better imbalanced classification performance than SMOTE, Borderline-SMOTE, Random-SMOTE, and SMOTE-IPF methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

D-AE: A Discriminant Encode-Decode Nets for Data Generation

Unsupervised feature learning with sparse Bayesian auto-encoding based extreme learning machine

Article 03 January 2020

Multilayer discriminative extreme learning machine for classification

Article 29 December 2022

Notes

References

Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)
Article Google Scholar
Díez-Pastor, J.F., Rodríguez, J.J., García-Osorio, C., Kuncheva, L.I.: Random balance: ensembles of variable priors classifiers for imbalanced data. Knowl.-Based Syst. 85, 96–111 (2015)
Article Google Scholar
Liu, X., Wu, J., Zhou, Z.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybernet Part B (Cybernetics) 39(2), 539–550 (2009)
Google Scholar
Sun, Y., Kamel, M.S., Wong, A.K.C., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn. 40(12), 3358–3378 (2007)
Article Google Scholar
Tan, S.: Neighbor-weighted K-nearest neighbor for unbalanced text corpus. Expert Syst. Appl. 28(4), 667–671 (2005)
Article Google Scholar
Zong, W.W., Huang, G.B., Chen, Y.Q.: Weighted extreme learning machine for imbalance learning. Neurocomputing 101, 229–242 (2013)
Article Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002)
Article Google Scholar
Han, H., Wang, W.Y., Mao, B.H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. Lect. Notes Comput. Sci. 3644, 878–887 (2005)
Article Google Scholar
Dong, Y.J., Wang, X.H.: A new over-sampling approach: Random-SMOTE for learning from imbalanced data sets. In: Proceedings of the 5th International Conference on Knowledge Science, Engineering and Management, vol. 10, pp. 343–352 (2011)
Google Scholar
Sáez, J.A., Luengo, J., Stefanowski, J., Herrera, F.: SMOTE-IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf. Sci. 291, 184–203 (2015)
Article Google Scholar
Calleja, J.L., Fuentes, O.: A Distance-based over-sampling method for learning from imbalanced data sets. In: Proceedings of the Twentieth International Florida Artificial Intelligence Research Society Conference (2007)
Google Scholar
Puntumapon, K., Waiyamai, K.: A pruning-based approach for searching precise and generalized region for synthetic minority over-sampling. In: Tan, P.-N., Chawla, S., Ho, C.K., Bailey, J. (eds.) PAKDD 2012. LNCS (LNAI), vol. 7302, pp. 371–382. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30220-6_31
Chapter Google Scholar
Lee, H., Kim, J., Kim, S.: Gaussian-based SMOTE algorithm for solving skewed class distributions. Int. J. Fuzzy Logic Intell. Syst. 17, 229–234 (2017)
Article Google Scholar
Douzas, G., Bacao, F., Last, F.: Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Inf. Sci. 465, 1–20 (2018)
Article Google Scholar
Kasun, L., Zhou, H.M., Huang, G.B., Vong, C.M.: Representational Learning with ELMs for Big Data. IEEE Intell. Syst. 28, 31–34 (2013)
Google Scholar
Lu, S.X., Wang, X., Zhang, G.Q., Zhou, X.: Effective algorithms of the Moore-Penrose inverse matrices for extreme learning machine. Intell. Data Anal. 19, 743–760 (2015)
Google Scholar
Alcala-Fdez, I., et al.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valu. Logic Soft Comput. 17, 255–287 (2010)
Google Scholar
He, Y.L., Liu, J.N.K., Wang, X.Z., Hu, Y.X.: Optimal bandwidth selection for re-substitution entropy estimation. Appl. Math. Comput. 219(8), 3425–3460 (2012)
MathSciNet MATH Google Scholar
Hand, D.J., Till, R.J.: A simple Generalisation of the area under the ROC curve for multiple class classification problems. Mach. Learn. 45(2), 171–186 (2001)
Article Google Scholar
Sun, Y., Kamel, M.S., Wang, Y.: Boosting for learning multiple classes with imbalanced class distribution. In: Proceedings of the Sixth International Conference on Data Mining, pp. 592–602 (2006)
Google Scholar
Lipton, Z.C., Elkan, C., Naryanaswamy, B.: Optimal thresholding of classifiers to maximize F1 measure. In: Proceedings of Machine Learning and Knowledge Discovery in Databases, pp. 225–239 (2014)
Google Scholar

Download references

Acknowledgement

The authors would like to thank the chairs and anonymous reviewers whose meticulous readings and valuable suggestions help them to improve this paper significantly. This paper was supported by National Natural Science Foundation of China (61972261) and Basic Research Foundation of Shenzhen (JCYJ 20210324093609026, JCYJ 20200813091134001), and Scientific Research Foundation of Shenzhen University for Newly-introduced Teachers (860/000002110628).

Author information

Authors and Affiliations

Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen, 518107, China
Yulin He & Joshua Zhexue Huang
College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518060, China
Yulin He, Qihang Huang, Shengsheng Xu & Joshua Zhexue Huang

Authors

Yulin He
View author publications
You can also search for this author in PubMed Google Scholar
Qihang Huang
View author publications
You can also search for this author in PubMed Google Scholar
Shengsheng Xu
View author publications
You can also search for this author in PubMed Google Scholar
Joshua Zhexue Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yulin He .

Editor information

Editors and Affiliations

University of Aizu, Aizu, Japan
Uday Kiran Rage
Indraprastha Institute of Information Technology, Delhi, India
Vikram Goyal
Data Sciences and Analytics Center, International Institute of Information Technology, Hyderabad, Telangana, India
P. Krishna Reddy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

He, Y., Huang, Q., Xu, S., Huang, J.Z. (2022). A Novel Method to Create Synthetic Samples with Autoencoder Multi-layer Extreme Learning Machine. In: Rage, U.K., Goyal, V., Reddy, P.K. (eds) Database Systems for Advanced Applications. DASFAA 2022 International Workshops. DASFAA 2022. Lecture Notes in Computer Science, vol 13248. Springer, Cham. https://doi.org/10.1007/978-3-031-11217-1_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-11217-1_2
Published: 16 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11216-4
Online ISBN: 978-3-031-11217-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Novel Method to Create Synthetic Samples with Autoencoder Multi-layer Extreme Learning Machine