Abstract
The presence of noise in datasets to which data mining techniques are applied can greatly reduce the quality and interest of the knowledge extracted. Subgroup discovery is a supervised descriptive rule discovery technique which is not exempt from this problem. The aim of this paper is to improve the descriptions of subgroups previously obtained by any subgroup discovery algorithm in noisy datasets. This is achieved using the post-processing approach of the MEFES algorithm, that first detects exceptions in the input subgroups and then includes those exceptions in the descriptions. The experiments performed in noisy datasets show the suitability of the proposal to improve the quality of the results.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Mult. Valued Logic Soft Comput. 17(2–3), 255–287 (2011)
Alcalá-Fdez, J., Sánchez, L., García, S., del Jesus, M., Ventura, S., Garrell, J., Otero, J., Romero, C., Bacardit, J., Rivas, V., Fernández, J., Herrera, F.: KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft. Comput. 13(3), 307–318 (2009)
Atzmueller, M., Puppe, F., Buscher, H.P.: Towards knowledge-intensive subgroup discovery. In: Proceedings of the Lernen–Wissensentdeckung-Adaptivität-Fachgruppe Maschinelles Lernen, pp. 111–117 (2004)
Brodley, C., Friedl, M.: Identifying mislabeled training data. J. Artif. Intell. 11, 131–167 (1999)
Carmona, C.J., Chrysostomou, C., Seker, H., del Jesus, M.J.: Fuzzy rules for describing subgroups from influenza a virus using a multi-objective evolutionary algorithm. Appl. Soft Comput. 13(8), 3439–3448 (2013)
Carmona, C.J., González, P., García-Domingo, B., del Jesus, M.J., Aguilera, J.: MEFES: an evolutionary proposal for the detection of exceptions in subgroup discovery. An application to concentrating photovoltaic technology. Knowl. Based Syst. 54, 73–85 (2013)
Carmona, C.J., González, P., del Jesus, M.J., Herrera, F.: NMEEF-SD: non-dominated multi-objective evolutionary algorithm for extracting fuzzy rules in subgroup discovery. IEEE Trans. Fuzzy Syst. 18(5), 958–970 (2010)
Carmona, C.J., González, P., del Jesus, M.J., Herrera, F.: Overview on evolutionary subgroup discovery: analysis of the suitability and potential of the search performed by evolutionary algorithms. WIREs Data Min. Knowl. Discov. 4(2), 87–103 (2014)
Carmona, C.J., González, P., del Jesus, M.J., Navío, M., Jiménez, L.: Evolutionary fuzzy rule extraction for subgroup discovery in a Psychiatric Emergency Department. Soft. Comput. 15(12), 2435–2448 (2011)
Carmona, C.J., Ramírez-Gallego, S., Torres, F., Bernal, E., del Jesus, M.J., García, S.: Web usage mining to improve the design of an e-commerce website: OrOliveSur.com. Expert Syst. Appl. 39, 11243–11249 (2012)
Carmona, C.J., Ruiz-Rodado, V., del Jesus, M.J., Weber, A., Grootveld, M., González, P., Elizondo, D.: A fuzzy genetic programming-based algorithm for subgroup discovery and the application to one problem of pathogenesis of acute sore throat conditions in humans. Inf. Sci. 298, 180–197 (2015)
Deb, K., Pratap, A., Agrawal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Gamberger, D., Lavrac, N.: Active subgroup mining: a case study in coronary heart disease risk group detection. Artif. Intell. Med. 28(1), 27–57 (2003)
García, S., Fernández, A., Luengo, J., Herrera, F.: Study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability. Soft. Comput. 13(10), 959–977 (2009)
García, S., Fernández, A., Luengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental Analysis of Power. Inf. Sci. 180, 2044–2064 (2010)
García, S., Herrera, F.: An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J. Mach. Learn. Res. 9, 2677–2694 (2008)
Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Longman Publishing Co., Inc., Reading (1989)
Herrera, F., Carmona, C.J., González, P., del Jesus, M.J.: An overview on subgroup discovery: foundations and applications. Knowl. Inf. Syst. 29(3), 495–525 (2011)
del Jesus, M.J., González, P., Herrera, F., Mesonero, M.: Evolutionary fuzzy rule induction process for subgroup discovery: a case study in marketing. IEEE Trans. Fuzzy Syst. 15(4), 578–592 (2007)
Jin, N., Flach, P.A., Wilcox, T., Sellman, R., Thumim, J., Knobbe, A.J.: Subgroup discovery in smart electricity meter data. IEEE Trans. Ind. Inf. 10(2), 1327–1336 (2014)
Kavsek, B., Lavrac, N.: APRIORI-SD: adapting association rule learning to subgroup discovery. Appl. Artif. Intell. 20, 543–583 (2006)
Khoshgoftaar, T.M., Rebours, P.: Improving software quality prediction by noise filtering techniques. J. Comput. Sci. Technol. 22(3), 387–396 (2007). doi:10.1007/s11390-007-9054-2
Kloesgen, W.: Advances in knowledge discovery and data mining, chap. Explora: A Multipattern and Multistrategy Discovery Assistant, pp. 249–271. American Association for Artificial Intelligence (1996)
Kloesgen, W., Zytkow, J. (eds.): Handbook of Data Mining and Knowledge Discovery. Oxford University Press Inc, New York (2002)
Lavrac, N., Cestnik, B., Gamberger, D., Flach, P.A.: Decision support through subgroup discovery: three case studies and the lessons learned. Mach. Learn. 57(1–2), 115–143 (2004)
Luengo, J., García-Vico, A.M., Pérez-Godoy, M.D., Carmona, C.J.: The influence of noise on the evolutionary fuzzy systems for subgroup discovery. Soft. Comput. 20(11), 4313–4330 (2016). doi:10.1007/s00500-016-2300-1
Noaman, A.Y., Luna, J.M., Ragab, A.H.M., Ventura, S.: Recommending degree studies according to students? Attitudes in high school by means of subgroup discovery. Int. J. Comput. Intell. Syst. 9(6), 1101–1117 (2016)
Poitras, E.G., Lajoie, S.P., Doleck, T., Jarrel, A.: Subgroup discovery with user interaction data: an empirically guided approach to improving intelligent tutoring systems. Educ. Technol. Soc. 19(2), 204–214 (2016)
Sheskin, D.: Handbook of Parametric and Nonparametric Statistical Procedures, 2nd edn. Chapman and Hall, London (2006)
Siebes, A.: Data surveying: foundations of an inductive query language. In: Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining, pp. 269–274. AAAI Press, Palo Alto (1995)
Suzuki, E.: Data mining methods for discovering interesting exceptions from an unsupervised table. J. Univers. Comput. Sci. 12(6), 627–653 (2006)
Wang, R.Y., Storey, V.C., Firth, C.P.: A framework for analysis of data quality research. IEEE Trans. Knowl. Data Eng. 7(4), 623–640 (1995). doi:10.1109/69.404034
Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics 1, 80–83 (1945)
Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Proceedings of the 1st European Symposium on Principles of Data Mining and Knowledge Discovery, LNAI, Vol. 1263, pp. 78–87. Springer, Berlin (1997)
Wrobel, S.: Relational Data Mining, chap. Inductive Logic Programming for Knowledge Discovery in Databases. Springer, Berlin (2001)
Wu, X.: Knowledge Acquisition from Databases. Ablex Publishing Corp, Norwood (1996)
Wu, X., Zhu, X.: Mining with noise knowledge: error-aware data mining. IEEE Trans. Syst. Man Cybern. Part A 38(4), 917–932 (2008)
Zhu, X., Wu, X.: Class noise vs. attribute noise: a quantitative study. Artif. Intell. Rev. 22(3), 177–210 (2004)
Zhu, X., Wu, X., Yang, Y.: Error detection and impactsensitive instance ranking in noisy datasets. In: Proceedings of the 19th National conference on Artificial Intelligence, pp. 378–383. AAAI Press, Palo Alto (2004)
Acknowledgements
This paper was supported by the Spanish Ministry of Economy and Competitiveness under Project TIN2015-68454-R (FEDER Founds).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
González, P., García-Vico, Á.M., Carmona, C.J. et al. Improvement of subgroup descriptions in noisy data by detecting exceptions. Prog Artif Intell 7, 55–64 (2018). https://doi.org/10.1007/s13748-017-0131-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13748-017-0131-7