Abstract
The presence of noise in data is a common problem that produces several negative consequences, and is an unavoidable problem, which affects the data collection and data preparation processes in Data Mining applications, where errors commonly occur. The performance of the models built under such circumstances will heavily depend on the quality of the training data. Hence, problems containing noise are complex problems and accurate solutions are often difficult to achieve without using specialized techniques. A particular supervised learning field as subgroup discovery has overlooked the analysis of noise and its impact in the description obtained. In this paper, the noise impact in subgroup discovery is analyzed in a complete experimental study, using recent filtering techniques for several class noise levels. Specifically, the analysis is performed through the FuGePSD algorithm which is a state-of-the-art SD algorithm based on genetic programming and fuzzy logic.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
References
Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic and Soft Computing 17(2–3):255–287
Bonissone P, Cadenas JM, Carmen M (2010) Garrido, and R. Andrés Díaz-Valladares. A fuzzy random forest. International Journal of Approximate Reasoning 51(7):729–747
Brodley CE, Friedl MA (1999) Identifying Mislabeled Training Data. Journal of Artificial Intelligence Research 11:131–167
Carmona CJ, González P, del Jesus M, Herrera F (2014) Overview on evolutionary subgroup discovery: analysis of the suitability and potential of the search performed by evolutionary algorithms. WIREs Data Mining and Knowledge Discovery 4(2):87–103
Carmona CJ, Ruiz-Rodado V, del Jesus M, Weber A, Grootveld M, González P, Elizondo D (2015) A fuzzy genetic programming-based algorithm for subgroup discovery and the application to one problem of pathogenesis of acute sore throat conditions in humans. Information Sciences 298:180–197
del Jesus MJ, González P, Herrera F, Mesonero M (2007) Evolutionary Fuzzy Rule Induction Process for Subgroup Discovery: A case study in marketing. IEEE Transactions on Fuzzy Systems 15(4):578–592
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7:1–30
A. E. Eiben and J. E. Smith. Introduction to evolutionary computation. Springer, 2003
Gamberger D, Lavrac N (2002) Expert-Guided Subgroup Discovery: Methodology and Application. Journal Artificial Intelligence Research 17:501–527
García S, Luengo J, Herrera F (2015) Data Preprocessing in Data Mining. Springer Publishing Company, Incorporated
Herrera F (2008) Genetic fuzzy systems: taxomony, current research trends and prospects. Evolutionary Intelligence 1:27–46
Herrera F, Carmona CJ, González P, del Jesus MJ (2011) An overview on Subgroup Discovery: Foundations and Applications. Knowledge and Information Systems 29(3):495–525
Khoshgoftaar TM, Rebours P (2007) Improving software quality prediction by noise filtering techniques. Journal of Computer Science and Technology 22:387–396
W. Kloesgen. Explora: A Multipattern and Multistrategy Discovery Assistant. In Advances in Knowledge Discovery and Data Mining, pages 249–271. American Association for Artificial Intelligence, 1996
J. R. Koza. Genetic Programming: On the Programming of computers by Means of Natural Selection. MIT Press, 1992
Lavrac N, Cestnik B, Gamberger D, Flach PA (2004) Decision Support Through Subgroup Discovery: Three Case Studies and the Lessons Learned. Machine Learning 57(1–2):115–143
G. J. Mclachlan. Discriminant Analysis and Statistical Pattern Recognition (Wiley Series in Probability and Statistics). Wiley-Interscience, 2004
J. R. Quinlan. C4.5: programs for machine learning. Morgan Kaufmann Publishers, San Francisco, CA, USA, 1993
C.-M. Teng. Correcting Noisy Data. In Proceedings of the Sixteenth International Conference on Machine Learning, pages 239–248, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers
S. Verbaeten and A. V. Assche. Ensemble methods for noise elimination in classification problems. In Fourth International Workshop on Multiple Classifier Systems, pages 317–325. Springer, 2003
S. Wrobel. An Algorithm for Multi-relational Discovery of Subgroups. In Proceedings of the 1st European Symposium on Principles of Data Mining and Knowledge Discovery, volume 1263 of LNAI, pages 78–87. Springer, 1997
L. A. Zadeh. The concept of a linguistic variable and its applications to approximate reasoning. Parts I, II, III. Information Science, 8–9:199–249,301–357,43–80, 1975
Zhu X, Wu X (2004) Class Noise vs. Attribute Noise: A Quantitative Study. Artificial Intelligence Review 22:177–210
X. Zhu, X. Wu, and Q. Chen. Eliminating class noise in large datasets. In Proceeding of the Twentieth International Conference on Machine Learning, pages 920–927, 2003
Acknowledgments
Supported by the the Spanish Ministry of Economy and Competitiveness under projects TIN2012-33856 (FEDER Founds), the Spanish Ministry of Science and Technology under Projects TIN2011-28488 and TIN2010-15055, and also by the Regional Projects P10-TIC-6858 and P12-TIC-2958.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Carmona, C.J., Luengo, J. (2015). A First Approach in the Class Noise Filtering Approaches for Fuzzy Subgroup Discovery. In: Herrero, Á., Sedano, J., Baruque, B., Quintián, H., Corchado, E. (eds) 10th International Conference on Soft Computing Models in Industrial and Environmental Applications. Advances in Intelligent Systems and Computing, vol 368. Springer, Cham. https://doi.org/10.1007/978-3-319-19719-7_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-19719-7_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19718-0
Online ISBN: 978-3-319-19719-7
eBook Packages: EngineeringEngineering (R0)