Skip to main content

A First Approach in the Class Noise Filtering Approaches for Fuzzy Subgroup Discovery

  • Conference paper
  • First Online:
  • 920 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 368))

Abstract

The presence of noise in data is a common problem that produces several negative consequences, and is an unavoidable problem, which affects the data collection and data preparation processes in Data Mining applications, where errors commonly occur. The performance of the models built under such circumstances will heavily depend on the quality of the training data. Hence, problems containing noise are complex problems and accurate solutions are often difficult to achieve without using specialized techniques. A particular supervised learning field as subgroup discovery has overlooked the analysis of noise and its impact in the description obtained. In this paper, the noise impact in subgroup discovery is analyzed in a complete experimental study, using recent filtering techniques for several class noise levels. Specifically, the analysis is performed through the FuGePSD algorithm which is a state-of-the-art SD algorithm based on genetic programming and fuzzy logic.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.keel.es/datasets.php.

References

  1. Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic and Soft Computing 17(2–3):255–287

    Google Scholar 

  2. Bonissone P, Cadenas JM, Carmen M (2010) Garrido, and R. Andrés Díaz-Valladares. A fuzzy random forest. International Journal of Approximate Reasoning 51(7):729–747

    Article  MathSciNet  Google Scholar 

  3. Brodley CE, Friedl MA (1999) Identifying Mislabeled Training Data. Journal of Artificial Intelligence Research 11:131–167

    MATH  Google Scholar 

  4. Carmona CJ, González P, del Jesus M, Herrera F (2014) Overview on evolutionary subgroup discovery: analysis of the suitability and potential of the search performed by evolutionary algorithms. WIREs Data Mining and Knowledge Discovery 4(2):87–103

    Article  Google Scholar 

  5. Carmona CJ, Ruiz-Rodado V, del Jesus M, Weber A, Grootveld M, González P, Elizondo D (2015) A fuzzy genetic programming-based algorithm for subgroup discovery and the application to one problem of pathogenesis of acute sore throat conditions in humans. Information Sciences 298:180–197

    Article  Google Scholar 

  6. del Jesus MJ, González P, Herrera F, Mesonero M (2007) Evolutionary Fuzzy Rule Induction Process for Subgroup Discovery: A case study in marketing. IEEE Transactions on Fuzzy Systems 15(4):578–592

    Article  Google Scholar 

  7. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7:1–30

    MATH  Google Scholar 

  8. A. E. Eiben and J. E. Smith. Introduction to evolutionary computation. Springer, 2003

    Google Scholar 

  9. Gamberger D, Lavrac N (2002) Expert-Guided Subgroup Discovery: Methodology and Application. Journal Artificial Intelligence Research 17:501–527

    MATH  Google Scholar 

  10. García S, Luengo J, Herrera F (2015) Data Preprocessing in Data Mining. Springer Publishing Company, Incorporated

    Book  Google Scholar 

  11. Herrera F (2008) Genetic fuzzy systems: taxomony, current research trends and prospects. Evolutionary Intelligence 1:27–46

    Article  Google Scholar 

  12. Herrera F, Carmona CJ, González P, del Jesus MJ (2011) An overview on Subgroup Discovery: Foundations and Applications. Knowledge and Information Systems 29(3):495–525

    Article  Google Scholar 

  13. Khoshgoftaar TM, Rebours P (2007) Improving software quality prediction by noise filtering techniques. Journal of Computer Science and Technology 22:387–396

    Article  Google Scholar 

  14. W. Kloesgen. Explora: A Multipattern and Multistrategy Discovery Assistant. In Advances in Knowledge Discovery and Data Mining, pages 249–271. American Association for Artificial Intelligence, 1996

    Google Scholar 

  15. J. R. Koza. Genetic Programming: On the Programming of computers by Means of Natural Selection. MIT Press, 1992

    Google Scholar 

  16. Lavrac N, Cestnik B, Gamberger D, Flach PA (2004) Decision Support Through Subgroup Discovery: Three Case Studies and the Lessons Learned. Machine Learning 57(1–2):115–143

    Article  MATH  Google Scholar 

  17. G. J. Mclachlan. Discriminant Analysis and Statistical Pattern Recognition (Wiley Series in Probability and Statistics). Wiley-Interscience, 2004

    Google Scholar 

  18. J. R. Quinlan. C4.5: programs for machine learning. Morgan Kaufmann Publishers, San Francisco, CA, USA, 1993

    Google Scholar 

  19. C.-M. Teng. Correcting Noisy Data. In Proceedings of the Sixteenth International Conference on Machine Learning, pages 239–248, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers

    Google Scholar 

  20. S. Verbaeten and A. V. Assche. Ensemble methods for noise elimination in classification problems. In Fourth International Workshop on Multiple Classifier Systems, pages 317–325. Springer, 2003

    Google Scholar 

  21. S. Wrobel. An Algorithm for Multi-relational Discovery of Subgroups. In Proceedings of the 1st European Symposium on Principles of Data Mining and Knowledge Discovery, volume 1263 of LNAI, pages 78–87. Springer, 1997

    Google Scholar 

  22. L. A. Zadeh. The concept of a linguistic variable and its applications to approximate reasoning. Parts I, II, III. Information Science, 8–9:199–249,301–357,43–80, 1975

    Google Scholar 

  23. Zhu X, Wu X (2004) Class Noise vs. Attribute Noise: A Quantitative Study. Artificial Intelligence Review 22:177–210

    Article  MATH  Google Scholar 

  24. X. Zhu, X. Wu, and Q. Chen. Eliminating class noise in large datasets. In Proceeding of the Twentieth International Conference on Machine Learning, pages 920–927, 2003

    Google Scholar 

Download references

Acknowledgments

Supported by the the Spanish Ministry of Economy and Competitiveness under projects TIN2012-33856 (FEDER Founds), the Spanish Ministry of Science and Technology under Projects TIN2011-28488 and TIN2010-15055, and also by the Regional Projects P10-TIC-6858 and P12-TIC-2958.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to C. J. Carmona .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Carmona, C.J., Luengo, J. (2015). A First Approach in the Class Noise Filtering Approaches for Fuzzy Subgroup Discovery. In: Herrero, Á., Sedano, J., Baruque, B., Quintián, H., Corchado, E. (eds) 10th International Conference on Soft Computing Models in Industrial and Environmental Applications. Advances in Intelligent Systems and Computing, vol 368. Springer, Cham. https://doi.org/10.1007/978-3-319-19719-7_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-19719-7_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-19718-0

  • Online ISBN: 978-3-319-19719-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics