ARCID: A New Approach to Deal with Imbalanced Datasets Classification

Abdellatif, Safa; Ben Hassine, Mohamed Ali; Ben Yahia, Sadok; Bouzeghoub, Amel

doi:10.1007/978-3-319-73117-9_40

ARCID: A New Approach to Deal with Imbalanced Datasets Classification

Safa Abdellatif¹⁸,
Mohamed Ali Ben Hassine¹⁸,
Sadok Ben Yahia¹⁸ &
…
Amel Bouzeghoub¹⁹

Conference paper
First Online: 22 December 2017

1343 Accesses
6 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10706))

Abstract

Classification is one of the most fundamental and well-known tasks in data mining. Class imbalance is the most challenging issue encountered when performing classification, i.e. when the number of instances belonging to the class of interest (minor class) is much lower than that of other classes (major classes). The class imbalance problem has become more and more marked while applying machine learning algorithms to real-world applications such as medical diagnosis, text classification, fraud detection, etc. Standard classifiers may yield very good results regarding the majority classes. However, this kind of classifiers yields bad results regarding the minority classes since they assume a relatively balanced class distribution and equal misclassification costs. To overcome this problem, we propose, in this paper, a novel associative classification algorithm called Association Rule-based Classification for Imbalanced Datasets (ARCID). This algorithm aims to extract significant knowledge from imbalanced datasets by emphasizing on information extracted from minor classes without drastically impacting the predictive accuracy of the classifier. Experimentations, against five datasets obtained from the UCI repository, have been conducted with reference to four assessment measures. Results show that ARCID outperforms standard algorithms. Furthermore, it is very competitive to Fitcare which is a class imbalance insensitive algorithm.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of 20th International Conference on Very Large Data Bases, VLDB 1994, Santiago de Chile, Chile, 12–15 September 1994, pp. 487–499 (1994)
Google Scholar
Ali, K., Manganaris, S., Srikant, R.: Partial classification using association rules. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-1997), Newport Beach, California, USA, 14–17 August 1997, pp. 115–118 (1997)
Google Scholar
Antonie, M., Zaïane, O.R.: An associative classifier based on positive and negative rules. In: Proceedings of the 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, DMKD 2004, Paris, France, 13 June 2004, pp. 64–69 (2004)
Google Scholar
Bekkar, M., Djemaa, H.K., Alitouche, T.A.: Evaluation measures for models assessment over imbalanced data sets. J. Inf. Eng. Appl. 3(10), 2–4 (2013)
Google Scholar
Bowyer, K.W., Chawla, N.V., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. CoRR abs/1106.1813 (2011). http://arxiv.org/abs/1106.1813
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MATH Google Scholar
Cerf, L., Gay, D., Selmaoui-Folcher, N., Crémilleux, B., Boulicaut, J.: Parameter-free classification in multi-class imbalanced data sets. Data Knowl. Eng. 87, 109–129 (2013)
Article Google Scholar
Cohen, W.W.: Fast effective rule induction. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 115–123 (1995)
Google Scholar
Gasmi, G., Yahia, S.B., Nguifo, E.M., Slimani, Y.: \(\cal{IGB}\): a new informative generic base of association rules. In: Ho, T.B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS, vol. 3518, pp. 81–90. Springer, Heidelberg (2005). https://doi.org/10.1007/11430919_11
Chapter Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Article Google Scholar
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91
Chapter Google Scholar
Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min. Knowl. Discov. 8(1), 53–87 (2004)
Article MathSciNet Google Scholar
Hart, P.E., Nilsson, N.J., Raphael, B.: A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 4(2), 100–107 (1968)
Article Google Scholar
Hido, S., Kashima, H., Takahashi, Y.: Roughly balanced bagging for imbalanced data. Stat. Anal. Data Min.: ASA Data Sci. J. 2(5–6), 412–426 (2009)
Article MathSciNet Google Scholar
Holmes, J.H.: Differential negative reinforcement improves classifier system learning rate in two-class problems with unequal base rates. In: Genetic Programming, pp. 635–642 (1998)
Google Scholar
Hu, B., Dong, W.: A study on cost behaviors of binary classification measures in class-imbalanced problems. CoRR abs/1403.7100 (2014)
Google Scholar
Japkowicz, N., Myers, C., Gluck, M., et al.: A novelty detection approach to classification. In: IJCAI, vol. 1, pp. 518–523 (1995)
Google Scholar
Kubat, M., Holte, R.C., Matwin, S.: Machine learning for the detection of oil spills in satellite radar images. Mach. Learn. 30(2–3), 195–215 (1998)
Article Google Scholar
Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), Nashville, Tennessee, USA, 8–12 July 1997, pp. 179–186 (1997)
Google Scholar
Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-1998), New York City, New York, USA, 27–31 August 1998, pp. 80–86 (1998)
Google Scholar
Merz, C.: UCI repository of machine learning databases (1996). http://www.ics.uci.edu/~mlearn/MLRepository.html
Mitchell, T.M.: Machine Learning. McGraw Hill Series in Computer Science. McGraw-Hill (1997)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, Burlington (1993)
Google Scholar
Quinlan, J.R., Cameron-Jones, R.M.: FOIL: a midterm report. In: Brazdil, P.B. (ed.) ECML 1993. LNCS, vol. 667, pp. 1–20. Springer, Heidelberg (1993). https://doi.org/10.1007/3-540-56602-3_124
Chapter Google Scholar
Rijsbergen, C.J.V.: Information Retrieval. Butterworth, London (1979)
MATH Google Scholar
Sasirekha, D., Punitha, A.: A comprehensive analysis on associative classification in medical datasets. Indian J. Sci. Technol. 8(33), 3–5 (2015)
Article Google Scholar
Thabtah, F., Cowling, P., Peng, Y.: Multiple label classification rules approach. J. Knowl. Inf. Syst. 9, 109–129 (2006)
Article Google Scholar
Tomek, I.: An experiment with the edited nearest-neighbor rule. IEEE Trans. Syst. Man Cybern. 6, 448–452 (1976)
MathSciNet MATH Google Scholar
Yang, Q., Wu, X.: 10 challenging problems in data mining research. Int. J. Inf. Technol. Decis. Mak. 5(04), 597–604 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Tunis El Manar, Faculty of Sciences of Tunis, LIPAH-LR11ES14, El Manar, 2092, Tunis, Tunisia
Safa Abdellatif, Mohamed Ali Ben Hassine & Sadok Ben Yahia
Institut Mines-TELECOM, TELECOM SudParis, UMR CNRS Samovar, 91011, Evry Cedex, France
Amel Bouzeghoub

Authors

Safa Abdellatif
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Ali Ben Hassine
View author publications
You can also search for this author in PubMed Google Scholar
Sadok Ben Yahia
View author publications
You can also search for this author in PubMed Google Scholar
Amel Bouzeghoub
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Safa Abdellatif .

Editor information

Editors and Affiliations

Vienna University of Technology , Vienna, Austria
A Min Tjoa
ISAE-ENSMA, Chasseneuil-du-Poitou, France
Ladjel Bellatreche
Vienna University of Technology, Vienna, Austria
Stefan Biffl
Utrecht University, Utrecht, The Netherlands
Jan van Leeuwen
Academy of Sciences, Prague, Czech Republic
Jiří Wiedermann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Abdellatif, S., Ben Hassine, M.A., Ben Yahia, S., Bouzeghoub, A. (2018). ARCID: A New Approach to Deal with Imbalanced Datasets Classification. In: Tjoa, A., Bellatreche, L., Biffl, S., van Leeuwen, J., Wiedermann, J. (eds) SOFSEM 2018: Theory and Practice of Computer Science. SOFSEM 2018. Lecture Notes in Computer Science(), vol 10706. Edizioni della Normale, Cham. https://doi.org/10.1007/978-3-319-73117-9_40

Download citation

DOI: https://doi.org/10.1007/978-3-319-73117-9_40
Published: 22 December 2017
Publisher Name: Edizioni della Normale, Cham
Print ISBN: 978-3-319-73116-2
Online ISBN: 978-3-319-73117-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics