Abstract
A hybrid sampling technique is proposed by combining Fuzzy C-Mean Clustering and Synthetic Minority Oversampling Technique (FCMSMT) for tackling the imbalanced multiclass classification problem. The mean number of classes is used as the number of instances for applying undersampling and oversampling. Using the mean as the fixed number of the required instances for each class can prevent the within-class imbalance data from being eliminated erroneously during undersampling. This technique can decrease both within-class and between-class errors, and thus can increase the classification performance. The study was conducted using eight benchmark datasets from KEEL and UCI repositories and the results were compared against three major classifiers based on G-mean and AUC measurements. The results reveal that the proposed technique could handle most of the multiclass imbalanced datasets used in the experiments for all classifiers and retain the integrity of the original data.
This is a preview of subscription content, log in via an institution.
References
López, V., Fernández, A., Herrera, F.: On the importance of the validation technique for classification with imbalanced datasets: addressing covariate shift when data is skewed. Inf. Sci. 257, 1–13 (2014)
Agrawal, A., Viktor, H.L., Paquet, E.: SCUT: multi-class imbalanced data classification using SMOTE and cluster-based undersampling. In: 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3 K), pp. 226–234. Lisbon (2015)
Jeatrakul, P., Wong, K.W., Fung, C.C.: Classification of imbalanced data by combining the complementary neural network and SMOTE algorithm. In: Wong, K.W., Mendis, B.S.U., Bouzerdoum, A. (eds.) ICONIP 2010. LNCS, vol. 6444, pp. 152–159. Springer, Heidelberg (2010). doi:10.1007/978-3-642-17534-3_19
Ou, G., Murphey, Y.L.: Multi-class pattern classification using neural networks. Pattern Recogn. 40(1), 4–18 (2007)
Fernández, A., del Jesus, M.J., Herrera, F.: Multi-class imbalanced data-sets with linguistic fuzzy rule based classification systems based on pairwise learning. In: Hüllermeier, E., Kruse, R., Hoffmann, F. (eds.) IPMU 2010. LNCS, vol. 6178, pp. 89–98. Springer, Heidelberg (2010). doi:10.1007/978-3-642-14049-5_10
Wang, S., Yao, X.: Multiclass imbalance problems: analysis and potential solutions. IEEE Trans. Syst. Man Cybern. Part B Cybern. 42(4), 1119–1130 (2012)
Rahman, M., Davis, D.N.: Addressing the class imbalance problem in medical datasets. Int. J. Mach. Learn. Comput. 3(2), 224–228 (2013)
Lin, W.C., Tsai, C.F., Hu, Y.H., Jhang, J.S.: Clustering-based undersampling in class-imbalanced data. Inf. Sci. 409–410, 17–26 (2017)
Kocyigit, Y., Seker, H.: Imbalanced data classifier by using ensemble fuzzy c-means clustering. In: The IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI 2012), pp. 952–955. Hong Kong (2012)
Chawla, N., Bowyer, K., Hall, L., Kegelmeyer, P.: SMOTE: synthetic minority over-sampling technique. Artif. Intell. Res. 16(1), 321–357 (2002)
Abdi, L., Hashemi, S.: To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Trans. Knowl. Data Eng. 28(1), 238–251 (2016)
Jian, C., Gao, J., Ao, Y.: A new sampling method for classifying imbalanced data based on support vector machine ensemble. Neurocomputing 193(1), 115–122 (2016)
KEEL Data-Mining Software Tool: Data Set Repository. http://sci2s.ugr.es/keel/imbalanced.php. Accessed 30 May 2017
Lichman, M.: UCI Machine Learning Repository. http://archive.ics.uci.edu/ml. Accessed 30 May 2017
Dumitru, C., Maria, V.: Advantages and disadvantages of using neural networks for predictions. Ovidius University Ann. Econ. Sci. Ser. 13(1), 444–449 (2013)
Karamizadeh, S., Abdullah, S.M., Halimi, M., Shayan, J., Rajabi, M.J.: Advantage and drawback of support vector machine functionality. In: International Conference on Computer, Communications, and Control Technology (I4CT 2014), pp. 63–65. Langkawi (2014)
Batuwita, R., Palade, V.: FSVM-CIL: fuzzy support vector machines for class imbalance learning. IEEE Trans. Fuzzy Syst. 18(3), 558–571 (2010)
Pruengkarn, R., Wong, K.W., Fung, C.C.: Data cleaning using complementary fuzzy support vector machine technique. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016. LNCS, vol. 9948, pp. 160–167. Springer, Cham (2016). doi:10.1007/978-3-319-46672-9_19
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Pruengkarn, R., Wong, K.W., Fung, C.C. (2017). Multiclass Imbalanced Classification Using Fuzzy C-Mean and SMOTE with Fuzzy Support Vector Machine. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10638. Springer, Cham. https://doi.org/10.1007/978-3-319-70139-4_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-70139-4_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70138-7
Online ISBN: 978-3-319-70139-4
eBook Packages: Computer ScienceComputer Science (R0)