Skip to main content

Advertisement

Log in

Improvement of subgroup descriptions in noisy data by detecting exceptions

  • Regular Paper
  • Published:
Progress in Artificial Intelligence Aims and scope Submit manuscript

Abstract

The presence of noise in datasets to which data mining techniques are applied can greatly reduce the quality and interest of the knowledge extracted. Subgroup discovery is a supervised descriptive rule discovery technique which is not exempt from this problem. The aim of this paper is to improve the descriptions of subgroups previously obtained by any subgroup discovery algorithm in noisy datasets. This is achieved using the post-processing approach of the MEFES algorithm, that first detects exceptions in the input subgroups and then includes those exceptions in the descriptions. The experiments performed in noisy datasets show the suitability of the proposal to improve the quality of the results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. http://www.keel.es.

  2. http://sci2s.ugr.es/sicidm/.

  3. http://simidat.ujaen.es/papers/SD-Noisy.

References

  1. Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Mult. Valued Logic Soft Comput. 17(2–3), 255–287 (2011)

    Google Scholar 

  2. Alcalá-Fdez, J., Sánchez, L., García, S., del Jesus, M., Ventura, S., Garrell, J., Otero, J., Romero, C., Bacardit, J., Rivas, V., Fernández, J., Herrera, F.: KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft. Comput. 13(3), 307–318 (2009)

    Article  Google Scholar 

  3. Atzmueller, M., Puppe, F., Buscher, H.P.: Towards knowledge-intensive subgroup discovery. In: Proceedings of the Lernen–Wissensentdeckung-Adaptivität-Fachgruppe Maschinelles Lernen, pp. 111–117 (2004)

  4. Brodley, C., Friedl, M.: Identifying mislabeled training data. J. Artif. Intell. 11, 131–167 (1999)

    MATH  Google Scholar 

  5. Carmona, C.J., Chrysostomou, C., Seker, H., del Jesus, M.J.: Fuzzy rules for describing subgroups from influenza a virus using a multi-objective evolutionary algorithm. Appl. Soft Comput. 13(8), 3439–3448 (2013)

    Article  Google Scholar 

  6. Carmona, C.J., González, P., García-Domingo, B., del Jesus, M.J., Aguilera, J.: MEFES: an evolutionary proposal for the detection of exceptions in subgroup discovery. An application to concentrating photovoltaic technology. Knowl. Based Syst. 54, 73–85 (2013)

    Article  Google Scholar 

  7. Carmona, C.J., González, P., del Jesus, M.J., Herrera, F.: NMEEF-SD: non-dominated multi-objective evolutionary algorithm for extracting fuzzy rules in subgroup discovery. IEEE Trans. Fuzzy Syst. 18(5), 958–970 (2010)

    Article  Google Scholar 

  8. Carmona, C.J., González, P., del Jesus, M.J., Herrera, F.: Overview on evolutionary subgroup discovery: analysis of the suitability and potential of the search performed by evolutionary algorithms. WIREs Data Min. Knowl. Discov. 4(2), 87–103 (2014)

    Article  Google Scholar 

  9. Carmona, C.J., González, P., del Jesus, M.J., Navío, M., Jiménez, L.: Evolutionary fuzzy rule extraction for subgroup discovery in a Psychiatric Emergency Department. Soft. Comput. 15(12), 2435–2448 (2011)

    Article  Google Scholar 

  10. Carmona, C.J., Ramírez-Gallego, S., Torres, F., Bernal, E., del Jesus, M.J., García, S.: Web usage mining to improve the design of an e-commerce website: OrOliveSur.com. Expert Syst. Appl. 39, 11243–11249 (2012)

    Article  Google Scholar 

  11. Carmona, C.J., Ruiz-Rodado, V., del Jesus, M.J., Weber, A., Grootveld, M., González, P., Elizondo, D.: A fuzzy genetic programming-based algorithm for subgroup discovery and the application to one problem of pathogenesis of acute sore throat conditions in humans. Inf. Sci. 298, 180–197 (2015)

    Article  Google Scholar 

  12. Deb, K., Pratap, A., Agrawal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)

    Article  Google Scholar 

  13. Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  14. Gamberger, D., Lavrac, N.: Active subgroup mining: a case study in coronary heart disease risk group detection. Artif. Intell. Med. 28(1), 27–57 (2003)

    Article  Google Scholar 

  15. García, S., Fernández, A., Luengo, J., Herrera, F.: Study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability. Soft. Comput. 13(10), 959–977 (2009)

    Article  Google Scholar 

  16. García, S., Fernández, A., Luengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental Analysis of Power. Inf. Sci. 180, 2044–2064 (2010)

    Article  Google Scholar 

  17. García, S., Herrera, F.: An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J. Mach. Learn. Res. 9, 2677–2694 (2008)

    MATH  Google Scholar 

  18. Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Longman Publishing Co., Inc., Reading (1989)

  19. Herrera, F., Carmona, C.J., González, P., del Jesus, M.J.: An overview on subgroup discovery: foundations and applications. Knowl. Inf. Syst. 29(3), 495–525 (2011)

    Article  Google Scholar 

  20. del Jesus, M.J., González, P., Herrera, F., Mesonero, M.: Evolutionary fuzzy rule induction process for subgroup discovery: a case study in marketing. IEEE Trans. Fuzzy Syst. 15(4), 578–592 (2007)

    Article  Google Scholar 

  21. Jin, N., Flach, P.A., Wilcox, T., Sellman, R., Thumim, J., Knobbe, A.J.: Subgroup discovery in smart electricity meter data. IEEE Trans. Ind. Inf. 10(2), 1327–1336 (2014)

    Article  Google Scholar 

  22. Kavsek, B., Lavrac, N.: APRIORI-SD: adapting association rule learning to subgroup discovery. Appl. Artif. Intell. 20, 543–583 (2006)

    Article  Google Scholar 

  23. Khoshgoftaar, T.M., Rebours, P.: Improving software quality prediction by noise filtering techniques. J. Comput. Sci. Technol. 22(3), 387–396 (2007). doi:10.1007/s11390-007-9054-2

    Article  Google Scholar 

  24. Kloesgen, W.: Advances in knowledge discovery and data mining, chap. Explora: A Multipattern and Multistrategy Discovery Assistant, pp. 249–271. American Association for Artificial Intelligence (1996)

  25. Kloesgen, W., Zytkow, J. (eds.): Handbook of Data Mining and Knowledge Discovery. Oxford University Press Inc, New York (2002)

  26. Lavrac, N., Cestnik, B., Gamberger, D., Flach, P.A.: Decision support through subgroup discovery: three case studies and the lessons learned. Mach. Learn. 57(1–2), 115–143 (2004)

    Article  MATH  Google Scholar 

  27. Luengo, J., García-Vico, A.M., Pérez-Godoy, M.D., Carmona, C.J.: The influence of noise on the evolutionary fuzzy systems for subgroup discovery. Soft. Comput. 20(11), 4313–4330 (2016). doi:10.1007/s00500-016-2300-1

    Article  Google Scholar 

  28. Noaman, A.Y., Luna, J.M., Ragab, A.H.M., Ventura, S.: Recommending degree studies according to students? Attitudes in high school by means of subgroup discovery. Int. J. Comput. Intell. Syst. 9(6), 1101–1117 (2016)

    Article  Google Scholar 

  29. Poitras, E.G., Lajoie, S.P., Doleck, T., Jarrel, A.: Subgroup discovery with user interaction data: an empirically guided approach to improving intelligent tutoring systems. Educ. Technol. Soc. 19(2), 204–214 (2016)

    Google Scholar 

  30. Sheskin, D.: Handbook of Parametric and Nonparametric Statistical Procedures, 2nd edn. Chapman and Hall, London (2006)

  31. Siebes, A.: Data surveying: foundations of an inductive query language. In: Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining, pp. 269–274. AAAI Press, Palo Alto (1995)

  32. Suzuki, E.: Data mining methods for discovering interesting exceptions from an unsupervised table. J. Univers. Comput. Sci. 12(6), 627–653 (2006)

    Google Scholar 

  33. Wang, R.Y., Storey, V.C., Firth, C.P.: A framework for analysis of data quality research. IEEE Trans. Knowl. Data Eng. 7(4), 623–640 (1995). doi:10.1109/69.404034

    Article  Google Scholar 

  34. Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics 1, 80–83 (1945)

    Article  MathSciNet  Google Scholar 

  35. Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Proceedings of the 1st European Symposium on Principles of Data Mining and Knowledge Discovery, LNAI, Vol. 1263, pp. 78–87. Springer, Berlin (1997)

  36. Wrobel, S.: Relational Data Mining, chap. Inductive Logic Programming for Knowledge Discovery in Databases. Springer, Berlin (2001)

  37. Wu, X.: Knowledge Acquisition from Databases. Ablex Publishing Corp, Norwood (1996)

    Google Scholar 

  38. Wu, X., Zhu, X.: Mining with noise knowledge: error-aware data mining. IEEE Trans. Syst. Man Cybern. Part A 38(4), 917–932 (2008)

    Article  Google Scholar 

  39. Zhu, X., Wu, X.: Class noise vs. attribute noise: a quantitative study. Artif. Intell. Rev. 22(3), 177–210 (2004)

    Article  MATH  Google Scholar 

  40. Zhu, X., Wu, X., Yang, Y.: Error detection and impactsensitive instance ranking in noisy datasets. In: Proceedings of the 19th National conference on Artificial Intelligence, pp. 378–383. AAAI Press, Palo Alto (2004)

Download references

Acknowledgements

This paper was supported by the Spanish Ministry of Economy and Competitiveness under Project TIN2015-68454-R (FEDER Founds).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pedro González.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

González, P., García-Vico, Á.M., Carmona, C.J. et al. Improvement of subgroup descriptions in noisy data by detecting exceptions. Prog Artif Intell 7, 55–64 (2018). https://doi.org/10.1007/s13748-017-0131-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13748-017-0131-7

Keywords