Skip to main content
Log in

The influence of noise on the evolutionary fuzzy systems for subgroup discovery

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

External factors such as the presence of noise in data can affect the data mining process. This is a common problem that produces several negative consequences which involves errors in the data collection, preparation and, above all, in the results obtained by the data mining techniques employed. The capabilities of the models built under such circumstances will depend heavily on the quality of the training data. Hence, problems containing noise are complex problems and accurate solutions are often difficult to achieve. A particular supervised learning field like subgroup discovery has overlooked the analysis of noise and its impact on the descriptions obtained. This paper presents an analysis of the impact of noise on the most relevant evolutionary fuzzy systems for subgroup discovery. We also focus on how filtering techniques, devised for predictive tasks, may alleviate the impact of noise on descriptive fields such as subgroup discovery. Specifically, the analysis is carried out using recent filtering techniques for several class noise levels. The results obtained show two different behaviours, on the one hand, the SDIGA and NMEEFSD algorithms present a decrease in the quality of the subgroups when the noise is increased, making necessary the application of noise filtering in order to compensate for this loss of quality. On the other hand, the FuGePSD algorithm demonstrates its great capacity to work in noisy environments without the necessity of using a preliminary filter. The study is completed with an analysis of the interpretability under the influence of noise focused on the number of rules and variables.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. http://www.keel.es/datasets.php.

References

  • Abellán J, Masegosa A (2012) Bagging schemes on the presence of class noise in classification. Expert Syst Appl 39(8):6827–6837

    Article  Google Scholar 

  • Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult Valued Log Soft Comput 17(2–3):255–287

    Google Scholar 

  • Atzmueller M, Puppe F (2006) SD-Map—a fast algorithm for exhaustive subgroup discovery. In: Proceedings of the 17th European conference on machine learning and 10th European conference on principles and practice of knowledge discovery in databases. Springer, LNCS, vol 4213, pp 6–17

  • Bonissone P, Cadenas JM, Carmen Garrido M, Andrés Díaz-Valladares R (2010) A fuzzy random forest. Int J Approx Reason 51(7):729–747

    Article  MathSciNet  MATH  Google Scholar 

  • Brodley CE, Friedl MA (1999) Identifying mislabeled training data. J Artif Intell Res 11:131–167

    MATH  Google Scholar 

  • Cao J, Kwong S, Wang R (2012) A noise-detection based AdaBoost algorithm for mislabeled data. Pattern Recognit 45(12):4451–4465

    Article  MATH  Google Scholar 

  • Carmona CJ, González P, del Jesus MJ, Herrera F (2010) NMEEF-SD: non-dominated multi-objective evolutionary algorithm for extracting fuzzy rules in subgroup discovery. IEEE Trans Fuzzy Syst 18(5):958–970

    Article  Google Scholar 

  • Carmona CJ, González P, del Jesus MJ, Navío M, Jiménez L (2011) Evolutionary fuzzy rule extraction for subgroup discovery in a psychiatric emergency department. Soft Comput 15(12):2435–2448

    Article  Google Scholar 

  • Carmona CJ, Ramírez-Gallego S, Torres F, Bernal E, del Jesus MJ, García S (2012) Web usage mining to improve the design of an e-commerce website: OrOliveSur.com. Expert Syst Appl 39:11243–11249

    Article  Google Scholar 

  • Carmona CJ, Chrysostomou C, Seker H, del Jesus MJ (2013a) Fuzzy rules for describing subgroups from Influenza A virus using a multi-objective evolutionary algorithm. Appl Soft Comput 13(8):3439–3448

    Article  Google Scholar 

  • Carmona CJ, González P, García-Domingo B, del Jesus MJ, Aguilera J (2013b) MEFES: an evolutionary proposal for the detection of exceptions in subgroup discovery. An application to Concentrating Photovoltaic Technology. Knowl Based Syst 54:73–85

    Article  Google Scholar 

  • Carmona CJ, González P, del Jesus M, Herrera F (2014) Overview on evolutionary subgroup discovery: analysis of the suitability and potential of the search performed by evolutionary algorithms. WIREs Data Min Knowl Discov 4(2):87–103. doi:10.1002/widm.1118

    Article  Google Scholar 

  • Carmona CJ, Ruiz-Rodado V, del Jesus MJ, Weber A, Grootveld M, González P, Elizondo D (2015) A fuzzy genetic programming-based algorithm for subgroup discovery and the application to one problem of pathogenesis of acute sore throat conditions in humans. Inf Sci 298:180–197

    Article  Google Scholar 

  • Cherkassky V, Mulier FM (2007) Learning from data: concepts, theory, and methods. Wiley-IEEE Press, New York

    Book  MATH  Google Scholar 

  • Deb K, Pratap A, Agrawal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evolut Comput 6(2):182–197

    Article  Google Scholar 

  • del Jesus MJ, González P, Herrera F, Mesonero M (2007) Evolutionary fuzzy rule induction process for subgroup discovery: a case study in marketing. IEEE Trans Fuzzy Syst 15(4):578–592

    Article  Google Scholar 

  • Eiben AE, Smith JE (2003) Introduction to evolutionary computation. Springer, Berlin

    Book  MATH  Google Scholar 

  • Fogel DB (1995) Evolutionary computation—toward a new philosophy of machine intelligence. IEEE Press, Piscataway

    MATH  Google Scholar 

  • Frénay B, Verleysen M (2014) Classification in the presence of label noise: a survey. IEEE Trans Neural Netw Learn Syst 25(5):845–869

    Article  Google Scholar 

  • Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32:675–701

    Article  MATH  Google Scholar 

  • Gamberger D, Lavrac N (2002) Expert-guided subgroup discovery: methodology and application. J Artif Intell Res 17:501–527

    MATH  Google Scholar 

  • García S, Luengo J, Herrera F (2015) Data preprocessing in data mining. Springer, Berlin

    Book  Google Scholar 

  • Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley Longman Publishing Co., Inc, Boston

    MATH  Google Scholar 

  • Grosskreutz H, Rueping S (2009) On subgroup discovery in numerical domains. Data Min Knowl Discov 19(2):210–216

    Article  MathSciNet  Google Scholar 

  • Herrera F (2008) Genetic fuzzy systems: taxonomy, current research trends and prospects. Evolut Intell 1:27–46

    Article  Google Scholar 

  • Herrera F, Carmona CJ, González P, del Jesus MJ (2011) An overview on subgroup discovery: foundations and applications. Knowl Inf Syst 29(3):495–525

    Article  Google Scholar 

  • Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor

    Google Scholar 

  • Kavsek B, Lavrac N (2006) APRIORI-SD: adapting association rule learning to subgroup discovery. Appl Artif Intell 20:543–583

    Article  Google Scholar 

  • Khoshgoftaar TM, Rebours P (2007) Improving software quality prediction by noise filtering techniques. J Comput Sci Technol 22:387–396

    Article  Google Scholar 

  • Kloesgen W (1996) Explora: a multipattern and multistrategy discovery assistant. In: Advances in knowledge discovery and data mining, american association for artificial intelligence, pp 249–271

  • Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge

    MATH  Google Scholar 

  • Lavrac N, Flach PA, Zupan B (1999) Rule evaluation measures: a unifying view. In: Proceedings of the 9th international workshop on inductive logic programming. Springer, LNCS, vol 1634, pp 174–185

  • Lavrac N, Cestnik B, Gamberger D, Flach PA (2004a) Decision support through subgroup discovery: three case studies and the lessons learned. Mach Learn 57(1–2):115–143

    Article  MATH  Google Scholar 

  • Lavrac N, Kavsek B, Flach PA, Todorovski L (2004b) Subgroup discovery with CN2-SD. J Mach Learn Res 5:153–188

    MathSciNet  Google Scholar 

  • Mclachlan GJ (2004) Discriminant analysis and statistical pattern recognition (Wiley series in probability and statistics). Wiley-Interscience, Hoboken

    Google Scholar 

  • Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers, San Francisco

    Google Scholar 

  • Sáez JA, Galar M, Luengo J, Herrera F (2014) Analyzing the presence of noise in multi-class problems: alleviating its influence with the one-vs-one decomposition. Knowl Inf Syst 38(1):179–206

    Article  Google Scholar 

  • Schwefel HP (1995) Evolution and optimum seeking. Sixth-generation computer technology series. Wiley, New York

    Google Scholar 

  • Sluban B, Gamberger D, Lavra N (2010) Performance analysis of class noise detection algorithms. Front Artif Intell Appl 222:303–314

    Google Scholar 

  • Sun B, Chen S, Wang J, Chen H (2016) A robust multi-class AdaBoost algorithm for mislabeled noisy data. Knowl Based Syst 102:87–102

    Article  Google Scholar 

  • Sáez JA, Galar M, Luengo J, Herrera F (2016) INFFC: an iterative class noise filter based on the fusion of classifiers with noise sensitivity control. Inf Fusion 27:19–32

    Article  Google Scholar 

  • Teng C (2004) Polishing blemishes: issues in data correction. IEEE Intell Syst 19(2):34–39

    Article  Google Scholar 

  • Teng CM (1999) Correcting noisy data. In: Proceedings of the sixteenth international conference on machine learning. Morgan Kaufmann Publishers, San Francisco, pp 239–248

  • Venturini G (1993) SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts. In: Proceedings European conference on machine learning. Springer, LNAI vol 667, pp 280–296

  • Verbaeten S, Assche AV (2003) Ensemble methods for noise elimination in classification problems. In: Fourth international workshop on multiple classifier systems. Springer, pp 317–325

  • Wrobel S (1997) An algorithm for multi-relational discovery of subgroups. In: Proceedings of the 1st European symposium on principles of data mining and knowledge discovery. Springer, LNAI, vol 1263, pp 78–87

  • Wrobel S (2001) Inductive logic programming for knowledge discovery in databases. Springer, chap Relational Data Mining, pp 74–101

  • Wu X, Zhu X (2008) Mining with noise knowledge: error-aware data mining. IEEE Tran Systems Man Cybern Part A Syst Hum 38(4):917–932

    Article  Google Scholar 

  • Zadeh LA (1975) The concept of a linguistic variable and its applications to approximate reasoning. Parts I, II, III. Inf Sci 8-9:199–249, 301–357, 43–80

  • Zhu X, Wu X (2004) Class noise vs. attribute noise: a quantitative study. Artif Intell Rev 22:177–210

    Article  MATH  Google Scholar 

  • Zhu X, Wu X, Chen Q (2003) Eliminating class noise in large datasets. In: Proceeding of the twentieth international conference on machine learning, pp 920–927

Download references

Acknowledgments

This work was supported by the Spanish Ministry of Economy and Competitiveness under Project TIN2015-68454-R (FEDER Founds), by the Spanish Ministry of Science and Technology under Project TIN2014-57251-P (National Projects) and by the Regional Excellence Projects P11-TIC-7765 and P12-TIC-2958.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to C. J. Carmona.

Ethics declarations

Conflict of interest

The author declares that there is no conflict of interests regarding the publication of this paper.

Ethical approval

This article does not contain any studies with human participants and animals performed by any of the authors.

Additional information

Communicated by A. Herrero.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Luengo, J., García-Vico, A.M., Pérez-Godoy, M.D. et al. The influence of noise on the evolutionary fuzzy systems for subgroup discovery. Soft Comput 20, 4313–4330 (2016). https://doi.org/10.1007/s00500-016-2300-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-016-2300-1

Keywords

Navigation