Skip to main content
Log in

A Big Data Approach for the Extraction of Fuzzy Emerging Patterns

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Nowadays, the growth of available data, known as big data, and machine learning techniques are changing our lives. The extraction of insights related to the underlying phenomena in data is key in order to improve decision-making processes. These underlying phenomena are described in emerging pattern mining by means of the description of the discriminative characteristics between the outputs of interest, which is a very important characteristic in machine learning. However, emerging pattern mining algorithms for big data environments have not been widely developed yet. This paper presents the first multi-objective evolutionary algorithm for emerging pattern mining in big data environments called BD-EFEP. BD-EFEP implements novelties for emerging pattern mining such as the MapReduce approach to improve the efficiency of the evaluation of the individuals, or the use of a token-competition-based procedure in order to boost the extraction of simple, general and reliable emerging pattern models. The experimental study performed using datasets with high number of examples shows the advantages of the algorithm proposed for the emerging pattern mining task in big data problems. Results show that the approach used by BD-EFEP opens new research lines for the extraction of high descriptive emerging patterns in big data environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. http://ceatic.ujaen.es/en

References

  1. Abbasi A, Sarker S, Chiang RH. Big data research in information systems: toward an inclusive research agenda. J Assoc Inf Syst 2016;17(2):1–32.

    Google Scholar 

  2. Aljarah I, Alam AZ, Faris H, Hassonah MA, Mirjalili S, Saadeh H. Simultaneous feature selection and support vector machine optimization using the grasshopper optimization algorithm. Cogn Comput 2018; 10(3):478–495.

    Article  Google Scholar 

  3. Antonelli M, Bernardo D, Hagras H, Marcelloni F. Multiobjective evolutionary optimization of type-2 fuzzy rule-based systems for financial data classification. IEEE Trans Fuzzy Syst 2017;25(2):249–264.

    Article  Google Scholar 

  4. Asuncion A, Newman DJ. 2007. UCI machine learning repository. http://www.ics.uci.edu/mlearn/MLRepository.html.

  5. Babaei M, Sheidaii M. Desirability-based design of space structures using genetic algorithm and fuzzy logic. International Journal of Civil Engineering 2017;15(2):231–245.

    Article  Google Scholar 

  6. Bailey J, Manoukian T, Ramamohanarao K. Fast algorithms for mining emerging patterns. Principles of data mining and knowledge discovery. Berlin: Springer; 2002. p. 187–208.

  7. Bethea R, Duran B, Boullion T. 1995. Statistical methods for engineers and scientists.

  8. Beyer MA, Laney D. 2012. The importance of ‘big data’: a definition.

  9. Carmona CJ, Chrysostomou C, Seker H, del Jesus MJ. Fuzzy rules for describing subgroups from influenza a virus using a multi-objective evolutionary algorithm. Appl Soft Comput 2013;13(8):3439–3448.

    Article  Google Scholar 

  10. Carmona CJ, González P, García-Domingo B, del Jesus MJ, Aguilera J. MEFES: An evolutionary proposal for the detection of exceptions in subgroup discovery. An application to Concentrating Photovoltaic Technology. Knowl-Based Syst 2013;54:73–85.

    Article  Google Scholar 

  11. Carmona CJ, González P, del Jesus MJ, Herrera F. NMEEF-SD: non-dominated multi-objective evolutionary algorithm for extracting fuzzy rules in subgroup discovery. IEEE Trans Fuzzy Syst 2010;18(5):958–970.

    Article  Google Scholar 

  12. Carmona CJ, González P, del Jesus MJ, Navío M, Jiménez L. Evolutionary fuzzy rule extraction for subgroup discovery in a psychiatric emergency department. Soft Comput 2011;15(12):2435–2448.

    Article  Google Scholar 

  13. Carmona CJ, del Jesus MJ, Herrera F. A unifying analysis for the supervised descriptive rule discovery via the weighted relative accuracy. Knowl-Based Syst 2018;139:89–100.

    Article  Google Scholar 

  14. Carmona CJ, Ramírez-Gallego S, Torres F, Bernal E, del Jesus MJ, García S. Web usage mining to improve the design of an e-commerce website: OrOliveSur.com. Expert Systems with Applications 2012;39: 11,243–11,249.

    Article  Google Scholar 

  15. Carmona CJ, Ruiz-Rodado V, del Jesus MJ, Weber A, Grootveld M, González P, Elizondo D. A fuzzy genetic programming-based algorithm for subgroup discovery and the application to one problem of pathogenesis of acute sore throat conditions in humans. Inf Sci 2015;298:180–197.

    Article  Google Scholar 

  16. Casillas J, Carse B, Bull L. Fuzzy-XCS: a michigan genetic fuzzy system. IEEE Trans Fuzzy Syst 2007; 15(4):536–550.

    Article  Google Scholar 

  17. Chakraborty S, Dey N, Samanta S, Ashour AS, Barna C, Balas M. Optimization of non-rigid demons registration using cuckoo search algorithm. Cogn Comput 2017;9(6):817–826.

    Article  Google Scholar 

  18. Chi Z, Yan H, Pham T. 1996. Fuzzy algorithms: with applications to image processing and pattern recognition, vol 10 World Scientific.

  19. Cordón O, Herrera F, Hoffmann F, Magdalena L. 2001. Genetic fuzzy systems: evolutionary tuning and learning of fuzzy knowledge bases world scientific.

  20. Cordón O., del Jesus MJ, Herrera F, Lozano M. MOGUL: A methodology To obtain genetic fuzzy rule-based systems under the iterative rule learning approach. Int J Intell Syst 1999;14:1123–1153.

    Article  Google Scholar 

  21. Dean J, Ghemawat S. Mapreduce: Simplified data processing on large clusters. Operating systems design and implementation (OSDI); 2004. p. 137–150.

  22. Dean J, Ghemawat S. Mapreduce: Simplified data processing on large clusters. Commun ACM 2008;51(1): 107–113.

    Article  Google Scholar 

  23. Dean J, Ghemawat S. Mapreduce: A flexible data processing tool. Commun ACM 2010;53(1):72–77.

    Article  Google Scholar 

  24. Deb K. Multi-objective optimization using evolutionary algorithms. Hoboken: Willey; 2001.

    Google Scholar 

  25. Deb K, Pratap A, Agrawal S, Meyarivan T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 2002;6(2):182–197.

    Article  Google Scholar 

  26. DeJong K, Spears W, Gordon DF. Using genetic algorithms for concept learning. Mach Learn 1997;13 (2):161–188.

    Google Scholar 

  27. Dheeru D, Karra Taniskidou E. 2017. UCI machine learning repository. http://archive.ics.uci.edu/ml.

  28. Dong GZ, Li JY. Efficient mining of emerging patterns: discovering trends and differences. Proc of the 5th ACM SIGKDD international conference on knowledge discovery and data mining. New York : ACM Press; 1999. p. 43–52.

  29. Dong GZ, Zhang X, Wong L, Li JY. CAEP: Classification By aggregating emerging patterns. Proc of the discovery science, LNCS. Berlin: Springer; 1999. p. 30–42.

  30. Elkano M, Galar M, Sanz J, Bustince H. Chi-bd: a fuzzy rule-based classification system for big data classification problems. Fuzzy Sets Syst 2018;348:75–101.

    Article  Google Scholar 

  31. Eshelman LJ. 1991. Foundations of genetic algorithms, chap. The CHC adaptive search algorithm: how to have safe search when engaging in nontraditional genetic recombination, pp 265–283.

  32. Fan H, Ramamohanarao K. Efficiently mining interesting emerging patterns. Proc of the 4th international conference on web-age information management; 2003. p. 189–201.

  33. Fan H, Ramamohanarao K. Fast discovery and the generalization of strong jumping emerging patterns for building compact and accurate classifiers. IEEE Trans Knowl Data Eng 2006;18(6):721–737.

    Article  CAS  Google Scholar 

  34. Fayyad UM, Piatetsky-Shapiro G, Smyth P. From data mining to knowledge discovery: an overview. Advances in knowledge discovery and data mining. Palo Alto: AAAI/MIT Press; 1996. p. 1–34.

  35. Fernández A, Altalhi A, Alshomrani S, Herrera F. Why linguistic fuzzy rule based classification systems perform well in big data applications. Int J Comput Intell Syst 2017;10(1):1211–1225.

    Article  Google Scholar 

  36. Fernández A, Carmona CJ, del Jesus MJ, Herrera F. A view on fuzzy systems for big data: progress and opportunities. International Journal of Computational Intelligence Systems 2016;9(1):69–80.

    Article  Google Scholar 

  37. Fernández A, Río S, López V, Bawakid A, del Jesus M, Benítez J, Herrera F. Big data with cloud computing: an insight on the computing environment, mapreduce and programming frameworks. WIREs Data Mining and Knowledge Discovery 2014;5(4):380–409.

    Article  Google Scholar 

  38. Fogel DB. 1995. Evolutionary computation - toward a new philosophy of machine intelligence. IEEE Press.

  39. Gamberger D, Lavrac N. Expert-guided subgroup discovery: methodology and application. J Artif Intell Res 2002;17:501–527.

    Article  Google Scholar 

  40. García-Borroto M, Martínez-Trinidad J, Carrasco-Ochoa J. Fuzzy emerging patterns for classifying hard domains. Knowl Inf Syst 2011;28(2):473–489.

    Article  Google Scholar 

  41. García-Borroto M, Martínez-Trinidad JF, Carrasco-Ochoa JA. A survery of emerging patters for supervised classification. Artif Intell Rev 2014;42(4):705–721.

    Article  Google Scholar 

  42. García-Borroto M, Martínez-Trinidad JF, Carrasco-Ochoa JA, Medina-Pérez MA, Ruiz-Shulcloper J. LCMine: an efficient algorithm for mining discriminative regularities and its application in supervised classifications. Pattern Recogn 2010;43(9):3025–3034.

    Article  Google Scholar 

  43. García-Vico AM, Carmona CJ, González P, del Jesus MJ. Moea-efep: Multi-objective evolutionary algorithm for extracting fuzzy emerging patterns. IEEE Transactions on Fuzzy Systems (In Press).

  44. García-Vico A, Carmona C, Martín D., García-Borroto M, del Jesus M. An overview of emerging pattern mining in supervised descriptive rule discovery: taxonomy, empirical study, trends, and prospects. WIREs Data Mining Knowl Discov. 2018;8:e1231. https://doi.org/10.1002/widm.1231.

  45. García-Vico AM, González P, del Jesus MJ, Carmona CJ. A first approach to handle emergining patterns mining on big data problems: the evaefp-spark algorithm. IEEE International conference on fuzzy systems; 2017. p. 1–6.

  46. García-Vico AM, Montes J, Aguilera J, Carmona CJ, del Jesus MJ. Analysing concentrating photovoltaics technology through the use of emerging pattern mining. Proc of the 11th international conference on soft computing models in industrial and environmental applications. Berlin: Springer; 2016. p. 1–8.

  47. Geng L, Hamilton HJ. Interestingness measures for data mining: a survey. ACM Comput Surv (CSUR) 2006; 38(3):9.

    Article  Google Scholar 

  48. Goldberg DE. 1989. Genetic algorithms in search, optimization and machine learning. Addison-wesley Longman Publishing Co. Inc.

  49. Herrera F. Genetic fuzzy systems: taxomony, current research trends and prospects. Evol Intel 2008;1:27–46.

    Article  Google Scholar 

  50. Holland JH. Adaptation in natural and artificial systems. Cambridge: University of Michigan Press; 1975.

    Google Scholar 

  51. Huang HC, Chiang CH. Backstepping holonomic tracking control of wheeled robots using an evolutionary fuzzy system with qualified ant colony optimization. Int J Fuzzy Syst 2016;18(1):28–40.

    Article  Google Scholar 

  52. Hüllermeier E. Fuzzy methods in machine learning and data mining: status and prospects. Fuzzy Sets Syst 2005;156(3):387–406.

    Article  Google Scholar 

  53. Hüllermeier E. Fuzzy sets in machine learning and data mining. Appl Soft Comput 2011;11(2):1493–1505.

    Article  Google Scholar 

  54. Ishibuchi H, Tsukamoto N, Hitotsuyanagi Y, Nojima Y. Effectiveness of scalability improvement attempts on the performance of nsga-ii for many-objective problems. Proceedings of the 10th annual conference on genetic and evolutionary computation (GECCO ’08); 2008. p. 649–656.

  55. del Jesus MJ, González P, Herrera F, Mesonero M. Evolutionary fuzzy rule induction process for subgroup discovery: a case study in marketing. IEEE Trans Fuzzy Syst 2007;15(4):578–592.

    Article  Google Scholar 

  56. Kloesgen W. Explora: a multipattern and multistrategy discovery assistant. Advances in knowledge discovery and data mining, pp 249–271. American association for artificial intelligence; 1996.

  57. Koza JR. Genetic programming: on the programming of computers by means of natural selection. Cambridge: MIT Press; 1992.

    Google Scholar 

  58. Kralj-Novak P, Lavrac N, Webb GI. Supervised descriptive rule discovery: a unifying survey of constrast set, emerging pattern and subgroup mining. J Mach Learn Res 2009;10:377–403.

    Google Scholar 

  59. Larson D, Chang V. A review and future direction of agile, business intelligence, analytics and data science. Int J Inf Manag 2016;36(5):700–710.

    Article  Google Scholar 

  60. Leung KS, Leung Y, So L, Yam KF. Rule learning in expert systems using genetic algorithm: 1, concepts. Proc of the 2nd international conference on fuzzy logic and neural networks. In: Jizuka K, editors; 1992. p. 201–204.

  61. Li G, Law R, Vu HQ, Rong J, Zhao XR. Identifying emerging hotel preferences using emerging pattern mining technique. Tour Manag 2015;46:311–321.

    Article  Google Scholar 

  62. Li JY, Dong GZ, Ramamohanarao K, Wong L. DeEPs: a new instance-based lazy discovery and classification system. Mach Learn 2004;54(2):99–124.

    Article  Google Scholar 

  63. Lin J. Mapreduce is good enough? if all you have is a hammer, throw away everything that’s not a nail!. Big Data 2013;1(1):28–37.

    Article  PubMed  Google Scholar 

  64. Liu Q, Shi P, Hu Z, Zhang Y. A novel approach of mining strong jumping emerging patterns based on BSC-tree. Int J Syst Sci 2014;45(3):598–615.

    Article  Google Scholar 

  65. Loyola-González O, Martínez-Trinidad JF, Carrasco-Ochoa JA, García-Borroto M. Effect of class imbalance on quality measures for contrast patterns: an experimental study. Inf Sci 2016;374:179–192.

    Article  Google Scholar 

  66. Loyola-González O, Martínez-Trinidad JF, Carrasco-Ochoa JA, García-Borroto M. Study of the impact of resampling methods for contrast pattern based classifiers in imbalanced databases. Neurocomputing 2016; 175:935–947.

    Article  Google Scholar 

  67. Loyola-González O, Medina-Pérez MA, Martínez-Trinidad JF, Carrasco-Ochoa JA, Monroy R, García-Borroto M. Pbc4cip: a new contrast pattern-based classifier for class imbalance problems. Knowl-Based Syst 2017;115:100–109.

    Article  Google Scholar 

  68. L’heureux A, Grolinger K, Elyamany HF, Capretz MA. Machine learning with big data: challenges and approaches. IEEE Access 2017;5(5):777–797.

    Google Scholar 

  69. Martens D, Baesens B, Van Gestel T, Vanthienen J. Comprehensible credit scoring models using rule extraction from support vector machines. Eur J Oper Res 2007;183(3):1466–1476.

    Article  Google Scholar 

  70. Métivier JP, Lepailleur A, Buzmakov A, Poezevara G, Crémilleux B, Kuznetsov SO, Goff JL, Napoli A, Bureau R, Cuissart B. Discovering structural alerts for mutagenicity using stable emerging molecular patterns. J Chem Inf Model 2015;55(5):925–940.

    Article  PubMed  CAS  Google Scholar 

  71. Michalski RS, Stepp R. Revealing conceptual structure in data by inductive inference. Machine Intelligence 1982;10:173–196.

    Google Scholar 

  72. Miller BL, Goldberg DE. Genetic algorithms, tournament selection, and the effects of noise. Complex System 1995;9:193–212.

    Google Scholar 

  73. Molina D, LaTorre A, Herrera F. 2018. An insight into bio-inspired and evolutionary algorithms for global optimization: review, analysis, and lessons learnt over a decade of competitions. Cognitive Computation, pp 1–28.

  74. Nie Y, Wang H, Lu X, Qin Y. Parallel emerging patterns in microarray. Proc of the 6th intelligent human-machine systems and cybernetics; 2014. p. 82–85.

  75. Onieva E, Hernandez-Jayo U, Osaba E, Perallos A, Zhang X. A multi-objective evolutionary algorithm for the tuning of fuzzy rule bases for uncoordinated intersections in autonomous driving. Inf Sci 2015;321: 14–30.

    Article  Google Scholar 

  76. Padillo F, Luna JM, Herrera F, Ventura S. 2018. Mining association rules on big data through mapreduce genetic programming. Integrated Computer-Aided Engineering (In Press), 1–19.

  77. Padillo F, Luna JM, Ventura S. An evolutionary algorithm for mining rare association rules: a big data approach. 2017 IEEE Congress on evolutionary computation (CEC); 2017. p. 2007–2014.

  78. Peralta D, Río S, Ramíez-Gallego S, Triguero I, Beníez JM, Herrera F. Evolutionary feature selection for big Data classification: a mapreduce approach. Math Probl Eng 2015;2015:1–11.

    Article  Google Scholar 

  79. Pulgar-Rubio F, Rivera-Rivas AJ, Pérez-Godoy MD, González P, Carmona CJ, Del Jesus MJ. MEFASD-BD: multi-objective evolutionary fuzzy algorithm for subgroup discovery in big data environments - a mapreduce solution. Knowl-Based Syst 2017;117:70–78.

    Article  Google Scholar 

  80. Ramamohanarao K, Fan H. Patterns based classifiers. World Wide Web 2007;10(1):71–83 .

    Article  Google Scholar 

  81. Ramírez-Gallego S, Fernández A., García S, Chen M, Herrera F. Big data: tutorial and guidelines on information and process fusion for analytics algorithms with mapreduce. Information Fusion 2018;42: 51–61.

    Article  Google Scholar 

  82. Ramírez-Gallego S, García S, Benítez J, Herrera F. A distributed evolutionary multivariate discretizer for big data processing on apache spark. Swarm Evol Comput 2018;38:240–250.

    Article  Google Scholar 

  83. del Río S, López V, Benítez JM, Herrera F. A mapreduce approach to address big data classification problems based on the fusion of linguistic fuzzy rules. International Journal of Computational Intelligence Systems 2015;8(3):422–437.

    Article  Google Scholar 

  84. Rodríguez-Fdez I, Mucientes M, Bugarín A. FRULER: Fuzzy rule learning through evolution for regression. Inf Sci 2016;354:1–18.

    Article  Google Scholar 

  85. Ruiz E, Casillas J. Adaptive fuzzy partitions for evolving association rules in big data stream. Int J Approx Reason 2018;93:463–486.

    Article  Google Scholar 

  86. Sanz JA, Bernardo D, Herrera F, Bustince H, Hagras H. A compact evolutionary interval-valued fuzzy rule-based classification system for the modeling and prediction of real-world financial applications with imbalanced data. IEEE Trans Fuzzy Syst 2015;23(4):973–990.

    Article  Google Scholar 

  87. Shvachko K, Kuang H, Radia S, Chansler R. The hadoop distributed file system. Proceedings of the 2010 IEEE 26th symposium on mass storage systems and technologies (MSST2010); 2010. p. 1–10.

  88. sSiddique N, Adeli H. Nature inspired computing: an overview and some future directions. Cogn Comput 2015;7(6):706–714.

    Article  Google Scholar 

  89. Storn R, Price K. 1995. Differential evolution: a simple and efficient adaptive scheme for global optimization over continuous spaces. Tech. Rep TR-95-012.

  90. Takagi T, Sugeno M. Fuzzy identification of systems and its applications to modeling and control. IEEE Trans Syst Man Cybern 1985;15(1):116–132.

    Article  Google Scholar 

  91. Tan PN, Kumar V, Srivastava J. Selecting the right objective measure for association analysis. Inf Syst 2004;29(4):293–313. Knowledge Discovery and Data Mining (KDD 2002).

    Article  Google Scholar 

  92. Terlecki P, Walczak K. Efficient discovery of Top-K minimal jumping emerging patterns. Proc of the 6th international conference rough sets and current trends in computing. Berlin: Springer; 2008. p. 438–447.

  93. Wang L, Wang Y, Zhao D. Building emerging pattern (ep) random forest for recognition. Proc of the 17th IEEE international conference on image processing; 2010. p. 1457–1460.

  94. Wang Z, Fan H, Ramamohanarao K. Exploiting maximal emerging patterns for classification. Proc of the 17th australian joint conference on artificial intelligence, LNCS. Berlin: Springer; 2005. p. 1062–1068.

  95. Wixom B, Ariyachandra T, Douglas DE, Goul M, Gupta B, Iyer LS, Kulkarni UR, Mooney JG, Phillips-Wren GE, Turetken O. The current state of business intelligence in academia: the arrival of big data. Commun Assoc Inf Syst 2014;34(1):1–13.

    Google Scholar 

  96. Wong ML, Leung KS. Data mining using grammar based genetic programming and applications. Dordrecht: Kluwer Academics Publishers; 2000.

    Google Scholar 

  97. Yaqoob I, Hashem IAT, Gani A, Mokhtar S, Ahmed E, Anuar NB, Vasilakos AV. Big data: from beginning to future. Int J Inf Manag 2016;36(6):1231–1247.

    Article  Google Scholar 

  98. Yu Y, Yan K, Zhu X, Wang G. Detecting of PIU behaviors based on discovered generators and emerging patterns from Computer-Mediated interaction events. Proc of the 15th international conference on web-age information management, LNCS. Amsterdam: Elsevier; 2014. p. 277–293.

  99. Zadeh LA. The concept of a linguistic variable and its applications to approximate reasoning. Parts I, II, III. Inf Sci 1975;8-9:199–249,301–357, 43–80.

    Article  Google Scholar 

  100. Zadeh LA. Soft computing and fuzzy logic. IEEE Softw 1994;11(6):48–56.

    Article  Google Scholar 

  101. Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin M, Shenker S, Stoica I. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. Proceedings of the 9th USENIX symposium on networked systems design and implementation; 2012.

  102. Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I. Spark: Cluster computing with working sets. Proceedings of the 2nd USENIX conference on hot topics in cloud computing; 2010. p. 10–10.

Download references

Funding

This study was funded by the Spanish Ministry of Economy and Competitiveness under the project TIN2015-68454-R and FPI 2016 Scholarship reference BES-2016-077738 (FEDER Founds).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ángel Miguel García-Vico.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

García-Vico, Á.M., González, P., Carmona, C.J. et al. A Big Data Approach for the Extraction of Fuzzy Emerging Patterns. Cogn Comput 11, 400–417 (2019). https://doi.org/10.1007/s12559-018-9612-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-018-9612-7

Keywords

Navigation