Skip to main content

Multiclass Imbalanced Classification Using Fuzzy C-Mean and SMOTE with Fuzzy Support Vector Machine

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10638))

Abstract

A hybrid sampling technique is proposed by combining Fuzzy C-Mean Clustering and Synthetic Minority Oversampling Technique (FCMSMT) for tackling the imbalanced multiclass classification problem. The mean number of classes is used as the number of instances for applying undersampling and oversampling. Using the mean as the fixed number of the required instances for each class can prevent the within-class imbalance data from being eliminated erroneously during undersampling. This technique can decrease both within-class and between-class errors, and thus can increase the classification performance. The study was conducted using eight benchmark datasets from KEEL and UCI repositories and the results were compared against three major classifiers based on G-mean and AUC measurements. The results reveal that the proposed technique could handle most of the multiclass imbalanced datasets used in the experiments for all classifiers and retain the integrity of the original data.

This is a preview of subscription content, log in via an institution.

References

  1. López, V., Fernández, A., Herrera, F.: On the importance of the validation technique for classification with imbalanced datasets: addressing covariate shift when data is skewed. Inf. Sci. 257, 1–13 (2014)

    Article  Google Scholar 

  2. Agrawal, A., Viktor, H.L., Paquet, E.: SCUT: multi-class imbalanced data classification using SMOTE and cluster-based undersampling. In: 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3 K), pp. 226–234. Lisbon (2015)

    Google Scholar 

  3. Jeatrakul, P., Wong, K.W., Fung, C.C.: Classification of imbalanced data by combining the complementary neural network and SMOTE algorithm. In: Wong, K.W., Mendis, B.S.U., Bouzerdoum, A. (eds.) ICONIP 2010. LNCS, vol. 6444, pp. 152–159. Springer, Heidelberg (2010). doi:10.1007/978-3-642-17534-3_19

    Chapter  Google Scholar 

  4. Ou, G., Murphey, Y.L.: Multi-class pattern classification using neural networks. Pattern Recogn. 40(1), 4–18 (2007)

    Article  MATH  Google Scholar 

  5. Fernández, A., del Jesus, M.J., Herrera, F.: Multi-class imbalanced data-sets with linguistic fuzzy rule based classification systems based on pairwise learning. In: Hüllermeier, E., Kruse, R., Hoffmann, F. (eds.) IPMU 2010. LNCS, vol. 6178, pp. 89–98. Springer, Heidelberg (2010). doi:10.1007/978-3-642-14049-5_10

    Chapter  Google Scholar 

  6. Wang, S., Yao, X.: Multiclass imbalance problems: analysis and potential solutions. IEEE Trans. Syst. Man Cybern. Part B Cybern. 42(4), 1119–1130 (2012)

    Article  Google Scholar 

  7. Rahman, M., Davis, D.N.: Addressing the class imbalance problem in medical datasets. Int. J. Mach. Learn. Comput. 3(2), 224–228 (2013)

    Article  Google Scholar 

  8. Lin, W.C., Tsai, C.F., Hu, Y.H., Jhang, J.S.: Clustering-based undersampling in class-imbalanced data. Inf. Sci. 409–410, 17–26 (2017)

    Article  Google Scholar 

  9. Kocyigit, Y., Seker, H.: Imbalanced data classifier by using ensemble fuzzy c-means clustering. In: The IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI 2012), pp. 952–955. Hong Kong (2012)

    Google Scholar 

  10. Chawla, N., Bowyer, K., Hall, L., Kegelmeyer, P.: SMOTE: synthetic minority over-sampling technique. Artif. Intell. Res. 16(1), 321–357 (2002)

    MATH  Google Scholar 

  11. Abdi, L., Hashemi, S.: To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Trans. Knowl. Data Eng. 28(1), 238–251 (2016)

    Article  Google Scholar 

  12. Jian, C., Gao, J., Ao, Y.: A new sampling method for classifying imbalanced data based on support vector machine ensemble. Neurocomputing 193(1), 115–122 (2016)

    Article  Google Scholar 

  13. KEEL Data-Mining Software Tool: Data Set Repository. http://sci2s.ugr.es/keel/imbalanced.php. Accessed 30 May 2017

  14. Lichman, M.: UCI Machine Learning Repository. http://archive.ics.uci.edu/ml. Accessed 30 May 2017

  15. Dumitru, C., Maria, V.: Advantages and disadvantages of using neural networks for predictions. Ovidius University Ann. Econ. Sci. Ser. 13(1), 444–449 (2013)

    Google Scholar 

  16. Karamizadeh, S., Abdullah, S.M., Halimi, M., Shayan, J., Rajabi, M.J.: Advantage and drawback of support vector machine functionality. In: International Conference on Computer, Communications, and Control Technology (I4CT 2014), pp. 63–65. Langkawi (2014)

    Google Scholar 

  17. Batuwita, R., Palade, V.: FSVM-CIL: fuzzy support vector machines for class imbalance learning. IEEE Trans. Fuzzy Syst. 18(3), 558–571 (2010)

    Article  Google Scholar 

  18. Pruengkarn, R., Wong, K.W., Fung, C.C.: Data cleaning using complementary fuzzy support vector machine technique. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016. LNCS, vol. 9948, pp. 160–167. Springer, Cham (2016). doi:10.1007/978-3-319-46672-9_19

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ratchakoon Pruengkarn .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pruengkarn, R., Wong, K.W., Fung, C.C. (2017). Multiclass Imbalanced Classification Using Fuzzy C-Mean and SMOTE with Fuzzy Support Vector Machine. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10638. Springer, Cham. https://doi.org/10.1007/978-3-319-70139-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-70139-4_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-70138-7

  • Online ISBN: 978-3-319-70139-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics