Clustering and Weighted Scoring Algorithm Based on Estimating the Number of Clusters

Klikowski, Jakub; Burduk, Robert

doi:10.1007/978-3-030-77967-2_4

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12744))

Included in the following conference series:

International Conference on Computational Science

1054 Accesses
1 Citations

Abstract

Imbalanced datasets are still a big method challenge in data mining and machine learning. Various machine learning methods and their combinations are considered to improve the quality of the classification of imbalanced datasets. This paper presents the approach with the clustering and weighted scoring function based on geometric space are used. In particular, we proposed a significant modification to our earlier algorithm. The proposed change concerns the use of automatic estimating the number of clusters and determining the minimum number of objects in a particular cluster. The proposed algorithm was compared with our earlier proposal and state-of-the-art algorithms using highly imbalanced datasets. The performed experiments show that the proposed modification is statistically better for a larger number of reference classifiers than the original algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Repository link: https://github.com/w4k2/cws-enc.

References

Abdallah, A., Maarof, M.A., Zainal, A.: Fraud detection system: a survey. J. Netw. Comput. Appl. 68, 90–113 (2016)
Article Google Scholar
Abdulhammed, R., Faezipour, M., Abuzneid, A., AbuMallouh, A.: Deep and machine learning approaches for anomaly-based intrusion detection of imbalanced network traffic. IEEE Sens. Lett. 3(1), 1–4 (2018)
Article Google Scholar
Alcalá-Fdez, J., et al.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17, 255–287 (2011)
Google Scholar
Alpaydin, E.: Introduction to Machine Learning. MIT Press, Cambridge (2014)
MATH Google Scholar
Basu, S., Banerjee, A., Mooney, R.: Semi-supervised clustering by seeding. In: Proceedings of 19th International Conference on Machine Learning, ICML 2002. Citeseer (2002)
Google Scholar
Choraś, M., Pawlicki, M., Kozik, R.: Recognizing faults in software related difficult data. In: Rodrigues, J.M.F., et al. (eds.) ICCS 2019. LNCS, vol. 11538, pp. 263–272. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-22744-9_20
Chapter Google Scholar
Fotouhi, S., Asadi, S., Kattan, M.W.: A comprehensive data level analysis for cancer diagnosis on imbalanced data. J. Biomed. Inform. 90, 103089 (2019)
Article Google Scholar
Fred, A., Lourenço, A.: Cluster ensemble methods: from single clusterings to combined solutions. In: Okun, O., Valentini, G. (eds.) Supervised and Unsupervised Ensemble Methods and Their Applications. SCI, vol. 126, pp. 3–30. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78981-9_1
Chapter Google Scholar
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(4), 463–484 (2011)
Article Google Scholar
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)
Article Google Scholar
Kaufmann, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)
Book Google Scholar
Klikowski, J., Ksieniewicz, P., Woźniak, M.: A genetic-based ensemble learning applied to imbalanced data classification. In: Yin, H., Camacho, D., Tino, P., Tallón-Ballesteros, A.J., Menezes, R., Allmendinger, R. (eds.) IDEAL 2019. LNCS, vol. 11872, pp. 340–352. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33617-2_35
Chapter Google Scholar
Koziarski, M., Woźniak, M., Krawczyk, B.: Combined cleaning and resampling algorithm for multi-class imbalanced data with label noise. arXiv preprint arXiv:2004.03406 (2020)
Kozik, R., Choras, M., Keller, J.: Balanced efficient lifelong learning (B-ELLA) for cyber attack detection. J. UCS 25(1), 2–15 (2019)
MathSciNet Google Scholar
Krawczyk, B., Woźniak, M.: Leveraging ensemble pruning for imbalanced data classification. In: 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 439–444. IEEE (2018)
Google Scholar
Krawczyk, B., Woźniak, M., Schaefer, G.: Cost-sensitive decision tree ensembles for effective imbalanced classification. Appl. Soft Comput. 14, 554–562 (2014)
Article Google Scholar
Ksieniewicz, P., Burduk, R.: Clustering and weighted scoring in geometric space support vector machine ensemble for highly imbalanced data classification. In: Krzhizhanovskaya, V.V., et al. (eds.) ICCS 2020. LNCS, vol. 12140, pp. 128–140. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-50423-6_10
Chapter Google Scholar
Ksieniewicz, P., Zyblewski, P.: stream-learn-open-source python library for difficult data stream batch analysis. arXiv preprint arXiv:2001.11077 (2020)
Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley, Hoboken (2004)
Book Google Scholar
Lopez-Garcia, P., Masegosa, A.D., Osaba, E., Onieva, E., Perallos, A.: Ensemble classification for imbalanced data based on feature space partitioning and hybrid metaheuristics. Appl. Intell. 49(8), 2807–2822 (2019)
Article Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Platt, J.C.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in Large Margin Classifiers, pp. 61–74. MIT Press (1999)
Google Scholar
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Article Google Scholar
Ruta, D., Gabrys, B.: Classifier selection for majority voting. Inf. Fusion 6(1), 63–81 (2005)
Article Google Scholar
Szeszko, P., Topczewska, M.: Empirical assessment of performance measures for preprocessing moments in imbalanced data classification problem. In: Saeed, K., Homenda, W. (eds.) CISIM 2016. LNCS, vol. 9842, pp. 183–194. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45378-1_17
Chapter Google Scholar
Woźniak, M.: Hybrid Classifiers: Methods of Data, Knowledge, and Classifier Combination. SCI, vol. 519. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40997-4
Book Google Scholar
Woźniak, M., Graña, M., Corchado, E.: A survey of multiple classifier systems as hybrid systems. Inf. Fusion 16, 3–17 (2014)
Article Google Scholar
Zhang, C., et al.: Multi-imbalance: an open-source software for multi-class imbalance learning. Knowl.-Based Syst. 174, 137–143 (2019)
Article Google Scholar

Download references

Acknowledgements

This work was supported by the Polish National Science Centre under the grant No. 2017/25/B/ST6/01750 as well as by the statutory funds of the Department of Systems and Computer Networks, Faculty of Electronics, Wroclaw University of Science and Technology.

Author information

Authors and Affiliations

Department of Systems and Computer Networks, Wroclaw University of Science and Technology, Wroclaw, Poland
Jakub Klikowski & Robert Burduk

Authors

Jakub Klikowski
View author publications
You can also search for this author in PubMed Google Scholar
Robert Burduk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robert Burduk .

Editor information

Editors and Affiliations

AGH University of Science and Technology, Krakow, Poland
Maciej Paszynski
Ludwig-Maximilians-Universität München, Munich, Germany
Dieter Kranzlmüller
University of Amsterdam, Amsterdam, The Netherlands
Valeria V. Krzhizhanovskaya
University of Tennessee at Knoxville, Knoxville, TN, USA
Jack J. Dongarra
University of Amsterdam, Amsterdam, The Netherlands
Peter M.A. Sloot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Klikowski, J., Burduk, R. (2021). Clustering and Weighted Scoring Algorithm Based on Estimating the Number of Clusters. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12744. Springer, Cham. https://doi.org/10.1007/978-3-030-77967-2_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-77967-2_4
Published: 09 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77966-5
Online ISBN: 978-3-030-77967-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics