Abstract
Nowadays, the growth of available data, known as big data, and machine learning techniques are changing our lives. The extraction of insights related to the underlying phenomena in data is key in order to improve decision-making processes. These underlying phenomena are described in emerging pattern mining by means of the description of the discriminative characteristics between the outputs of interest, which is a very important characteristic in machine learning. However, emerging pattern mining algorithms for big data environments have not been widely developed yet. This paper presents the first multi-objective evolutionary algorithm for emerging pattern mining in big data environments called BD-EFEP. BD-EFEP implements novelties for emerging pattern mining such as the MapReduce approach to improve the efficiency of the evaluation of the individuals, or the use of a token-competition-based procedure in order to boost the extraction of simple, general and reliable emerging pattern models. The experimental study performed using datasets with high number of examples shows the advantages of the algorithm proposed for the emerging pattern mining task in big data problems. Results show that the approach used by BD-EFEP opens new research lines for the extraction of high descriptive emerging patterns in big data environments.


Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abbasi A, Sarker S, Chiang RH. Big data research in information systems: toward an inclusive research agenda. J Assoc Inf Syst 2016;17(2):1–32.
Aljarah I, Alam AZ, Faris H, Hassonah MA, Mirjalili S, Saadeh H. Simultaneous feature selection and support vector machine optimization using the grasshopper optimization algorithm. Cogn Comput 2018; 10(3):478–495.
Antonelli M, Bernardo D, Hagras H, Marcelloni F. Multiobjective evolutionary optimization of type-2 fuzzy rule-based systems for financial data classification. IEEE Trans Fuzzy Syst 2017;25(2):249–264.
Asuncion A, Newman DJ. 2007. UCI machine learning repository. http://www.ics.uci.edu/mlearn/MLRepository.html.
Babaei M, Sheidaii M. Desirability-based design of space structures using genetic algorithm and fuzzy logic. International Journal of Civil Engineering 2017;15(2):231–245.
Bailey J, Manoukian T, Ramamohanarao K. Fast algorithms for mining emerging patterns. Principles of data mining and knowledge discovery. Berlin: Springer; 2002. p. 187–208.
Bethea R, Duran B, Boullion T. 1995. Statistical methods for engineers and scientists.
Beyer MA, Laney D. 2012. The importance of ‘big data’: a definition.
Carmona CJ, Chrysostomou C, Seker H, del Jesus MJ. Fuzzy rules for describing subgroups from influenza a virus using a multi-objective evolutionary algorithm. Appl Soft Comput 2013;13(8):3439–3448.
Carmona CJ, González P, García-Domingo B, del Jesus MJ, Aguilera J. MEFES: An evolutionary proposal for the detection of exceptions in subgroup discovery. An application to Concentrating Photovoltaic Technology. Knowl-Based Syst 2013;54:73–85.
Carmona CJ, González P, del Jesus MJ, Herrera F. NMEEF-SD: non-dominated multi-objective evolutionary algorithm for extracting fuzzy rules in subgroup discovery. IEEE Trans Fuzzy Syst 2010;18(5):958–970.
Carmona CJ, González P, del Jesus MJ, Navío M, Jiménez L. Evolutionary fuzzy rule extraction for subgroup discovery in a psychiatric emergency department. Soft Comput 2011;15(12):2435–2448.
Carmona CJ, del Jesus MJ, Herrera F. A unifying analysis for the supervised descriptive rule discovery via the weighted relative accuracy. Knowl-Based Syst 2018;139:89–100.
Carmona CJ, Ramírez-Gallego S, Torres F, Bernal E, del Jesus MJ, García S. Web usage mining to improve the design of an e-commerce website: OrOliveSur.com. Expert Systems with Applications 2012;39: 11,243–11,249.
Carmona CJ, Ruiz-Rodado V, del Jesus MJ, Weber A, Grootveld M, González P, Elizondo D. A fuzzy genetic programming-based algorithm for subgroup discovery and the application to one problem of pathogenesis of acute sore throat conditions in humans. Inf Sci 2015;298:180–197.
Casillas J, Carse B, Bull L. Fuzzy-XCS: a michigan genetic fuzzy system. IEEE Trans Fuzzy Syst 2007; 15(4):536–550.
Chakraborty S, Dey N, Samanta S, Ashour AS, Barna C, Balas M. Optimization of non-rigid demons registration using cuckoo search algorithm. Cogn Comput 2017;9(6):817–826.
Chi Z, Yan H, Pham T. 1996. Fuzzy algorithms: with applications to image processing and pattern recognition, vol 10 World Scientific.
Cordón O, Herrera F, Hoffmann F, Magdalena L. 2001. Genetic fuzzy systems: evolutionary tuning and learning of fuzzy knowledge bases world scientific.
Cordón O., del Jesus MJ, Herrera F, Lozano M. MOGUL: A methodology To obtain genetic fuzzy rule-based systems under the iterative rule learning approach. Int J Intell Syst 1999;14:1123–1153.
Dean J, Ghemawat S. Mapreduce: Simplified data processing on large clusters. Operating systems design and implementation (OSDI); 2004. p. 137–150.
Dean J, Ghemawat S. Mapreduce: Simplified data processing on large clusters. Commun ACM 2008;51(1): 107–113.
Dean J, Ghemawat S. Mapreduce: A flexible data processing tool. Commun ACM 2010;53(1):72–77.
Deb K. Multi-objective optimization using evolutionary algorithms. Hoboken: Willey; 2001.
Deb K, Pratap A, Agrawal S, Meyarivan T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 2002;6(2):182–197.
DeJong K, Spears W, Gordon DF. Using genetic algorithms for concept learning. Mach Learn 1997;13 (2):161–188.
Dheeru D, Karra Taniskidou E. 2017. UCI machine learning repository. http://archive.ics.uci.edu/ml.
Dong GZ, Li JY. Efficient mining of emerging patterns: discovering trends and differences. Proc of the 5th ACM SIGKDD international conference on knowledge discovery and data mining. New York : ACM Press; 1999. p. 43–52.
Dong GZ, Zhang X, Wong L, Li JY. CAEP: Classification By aggregating emerging patterns. Proc of the discovery science, LNCS. Berlin: Springer; 1999. p. 30–42.
Elkano M, Galar M, Sanz J, Bustince H. Chi-bd: a fuzzy rule-based classification system for big data classification problems. Fuzzy Sets Syst 2018;348:75–101.
Eshelman LJ. 1991. Foundations of genetic algorithms, chap. The CHC adaptive search algorithm: how to have safe search when engaging in nontraditional genetic recombination, pp 265–283.
Fan H, Ramamohanarao K. Efficiently mining interesting emerging patterns. Proc of the 4th international conference on web-age information management; 2003. p. 189–201.
Fan H, Ramamohanarao K. Fast discovery and the generalization of strong jumping emerging patterns for building compact and accurate classifiers. IEEE Trans Knowl Data Eng 2006;18(6):721–737.
Fayyad UM, Piatetsky-Shapiro G, Smyth P. From data mining to knowledge discovery: an overview. Advances in knowledge discovery and data mining. Palo Alto: AAAI/MIT Press; 1996. p. 1–34.
Fernández A, Altalhi A, Alshomrani S, Herrera F. Why linguistic fuzzy rule based classification systems perform well in big data applications. Int J Comput Intell Syst 2017;10(1):1211–1225.
Fernández A, Carmona CJ, del Jesus MJ, Herrera F. A view on fuzzy systems for big data: progress and opportunities. International Journal of Computational Intelligence Systems 2016;9(1):69–80.
Fernández A, Río S, López V, Bawakid A, del Jesus M, Benítez J, Herrera F. Big data with cloud computing: an insight on the computing environment, mapreduce and programming frameworks. WIREs Data Mining and Knowledge Discovery 2014;5(4):380–409.
Fogel DB. 1995. Evolutionary computation - toward a new philosophy of machine intelligence. IEEE Press.
Gamberger D, Lavrac N. Expert-guided subgroup discovery: methodology and application. J Artif Intell Res 2002;17:501–527.
García-Borroto M, Martínez-Trinidad J, Carrasco-Ochoa J. Fuzzy emerging patterns for classifying hard domains. Knowl Inf Syst 2011;28(2):473–489.
García-Borroto M, Martínez-Trinidad JF, Carrasco-Ochoa JA. A survery of emerging patters for supervised classification. Artif Intell Rev 2014;42(4):705–721.
García-Borroto M, Martínez-Trinidad JF, Carrasco-Ochoa JA, Medina-Pérez MA, Ruiz-Shulcloper J. LCMine: an efficient algorithm for mining discriminative regularities and its application in supervised classifications. Pattern Recogn 2010;43(9):3025–3034.
García-Vico AM, Carmona CJ, González P, del Jesus MJ. Moea-efep: Multi-objective evolutionary algorithm for extracting fuzzy emerging patterns. IEEE Transactions on Fuzzy Systems (In Press).
García-Vico A, Carmona C, Martín D., García-Borroto M, del Jesus M. An overview of emerging pattern mining in supervised descriptive rule discovery: taxonomy, empirical study, trends, and prospects. WIREs Data Mining Knowl Discov. 2018;8:e1231. https://doi.org/10.1002/widm.1231.
García-Vico AM, González P, del Jesus MJ, Carmona CJ. A first approach to handle emergining patterns mining on big data problems: the evaefp-spark algorithm. IEEE International conference on fuzzy systems; 2017. p. 1–6.
García-Vico AM, Montes J, Aguilera J, Carmona CJ, del Jesus MJ. Analysing concentrating photovoltaics technology through the use of emerging pattern mining. Proc of the 11th international conference on soft computing models in industrial and environmental applications. Berlin: Springer; 2016. p. 1–8.
Geng L, Hamilton HJ. Interestingness measures for data mining: a survey. ACM Comput Surv (CSUR) 2006; 38(3):9.
Goldberg DE. 1989. Genetic algorithms in search, optimization and machine learning. Addison-wesley Longman Publishing Co. Inc.
Herrera F. Genetic fuzzy systems: taxomony, current research trends and prospects. Evol Intel 2008;1:27–46.
Holland JH. Adaptation in natural and artificial systems. Cambridge: University of Michigan Press; 1975.
Huang HC, Chiang CH. Backstepping holonomic tracking control of wheeled robots using an evolutionary fuzzy system with qualified ant colony optimization. Int J Fuzzy Syst 2016;18(1):28–40.
Hüllermeier E. Fuzzy methods in machine learning and data mining: status and prospects. Fuzzy Sets Syst 2005;156(3):387–406.
Hüllermeier E. Fuzzy sets in machine learning and data mining. Appl Soft Comput 2011;11(2):1493–1505.
Ishibuchi H, Tsukamoto N, Hitotsuyanagi Y, Nojima Y. Effectiveness of scalability improvement attempts on the performance of nsga-ii for many-objective problems. Proceedings of the 10th annual conference on genetic and evolutionary computation (GECCO ’08); 2008. p. 649–656.
del Jesus MJ, González P, Herrera F, Mesonero M. Evolutionary fuzzy rule induction process for subgroup discovery: a case study in marketing. IEEE Trans Fuzzy Syst 2007;15(4):578–592.
Kloesgen W. Explora: a multipattern and multistrategy discovery assistant. Advances in knowledge discovery and data mining, pp 249–271. American association for artificial intelligence; 1996.
Koza JR. Genetic programming: on the programming of computers by means of natural selection. Cambridge: MIT Press; 1992.
Kralj-Novak P, Lavrac N, Webb GI. Supervised descriptive rule discovery: a unifying survey of constrast set, emerging pattern and subgroup mining. J Mach Learn Res 2009;10:377–403.
Larson D, Chang V. A review and future direction of agile, business intelligence, analytics and data science. Int J Inf Manag 2016;36(5):700–710.
Leung KS, Leung Y, So L, Yam KF. Rule learning in expert systems using genetic algorithm: 1, concepts. Proc of the 2nd international conference on fuzzy logic and neural networks. In: Jizuka K, editors; 1992. p. 201–204.
Li G, Law R, Vu HQ, Rong J, Zhao XR. Identifying emerging hotel preferences using emerging pattern mining technique. Tour Manag 2015;46:311–321.
Li JY, Dong GZ, Ramamohanarao K, Wong L. DeEPs: a new instance-based lazy discovery and classification system. Mach Learn 2004;54(2):99–124.
Lin J. Mapreduce is good enough? if all you have is a hammer, throw away everything that’s not a nail!. Big Data 2013;1(1):28–37.
Liu Q, Shi P, Hu Z, Zhang Y. A novel approach of mining strong jumping emerging patterns based on BSC-tree. Int J Syst Sci 2014;45(3):598–615.
Loyola-González O, Martínez-Trinidad JF, Carrasco-Ochoa JA, García-Borroto M. Effect of class imbalance on quality measures for contrast patterns: an experimental study. Inf Sci 2016;374:179–192.
Loyola-González O, Martínez-Trinidad JF, Carrasco-Ochoa JA, García-Borroto M. Study of the impact of resampling methods for contrast pattern based classifiers in imbalanced databases. Neurocomputing 2016; 175:935–947.
Loyola-González O, Medina-Pérez MA, Martínez-Trinidad JF, Carrasco-Ochoa JA, Monroy R, García-Borroto M. Pbc4cip: a new contrast pattern-based classifier for class imbalance problems. Knowl-Based Syst 2017;115:100–109.
L’heureux A, Grolinger K, Elyamany HF, Capretz MA. Machine learning with big data: challenges and approaches. IEEE Access 2017;5(5):777–797.
Martens D, Baesens B, Van Gestel T, Vanthienen J. Comprehensible credit scoring models using rule extraction from support vector machines. Eur J Oper Res 2007;183(3):1466–1476.
Métivier JP, Lepailleur A, Buzmakov A, Poezevara G, Crémilleux B, Kuznetsov SO, Goff JL, Napoli A, Bureau R, Cuissart B. Discovering structural alerts for mutagenicity using stable emerging molecular patterns. J Chem Inf Model 2015;55(5):925–940.
Michalski RS, Stepp R. Revealing conceptual structure in data by inductive inference. Machine Intelligence 1982;10:173–196.
Miller BL, Goldberg DE. Genetic algorithms, tournament selection, and the effects of noise. Complex System 1995;9:193–212.
Molina D, LaTorre A, Herrera F. 2018. An insight into bio-inspired and evolutionary algorithms for global optimization: review, analysis, and lessons learnt over a decade of competitions. Cognitive Computation, pp 1–28.
Nie Y, Wang H, Lu X, Qin Y. Parallel emerging patterns in microarray. Proc of the 6th intelligent human-machine systems and cybernetics; 2014. p. 82–85.
Onieva E, Hernandez-Jayo U, Osaba E, Perallos A, Zhang X. A multi-objective evolutionary algorithm for the tuning of fuzzy rule bases for uncoordinated intersections in autonomous driving. Inf Sci 2015;321: 14–30.
Padillo F, Luna JM, Herrera F, Ventura S. 2018. Mining association rules on big data through mapreduce genetic programming. Integrated Computer-Aided Engineering (In Press), 1–19.
Padillo F, Luna JM, Ventura S. An evolutionary algorithm for mining rare association rules: a big data approach. 2017 IEEE Congress on evolutionary computation (CEC); 2017. p. 2007–2014.
Peralta D, Río S, Ramíez-Gallego S, Triguero I, Beníez JM, Herrera F. Evolutionary feature selection for big Data classification: a mapreduce approach. Math Probl Eng 2015;2015:1–11.
Pulgar-Rubio F, Rivera-Rivas AJ, Pérez-Godoy MD, González P, Carmona CJ, Del Jesus MJ. MEFASD-BD: multi-objective evolutionary fuzzy algorithm for subgroup discovery in big data environments - a mapreduce solution. Knowl-Based Syst 2017;117:70–78.
Ramamohanarao K, Fan H. Patterns based classifiers. World Wide Web 2007;10(1):71–83 .
Ramírez-Gallego S, Fernández A., García S, Chen M, Herrera F. Big data: tutorial and guidelines on information and process fusion for analytics algorithms with mapreduce. Information Fusion 2018;42: 51–61.
Ramírez-Gallego S, García S, Benítez J, Herrera F. A distributed evolutionary multivariate discretizer for big data processing on apache spark. Swarm Evol Comput 2018;38:240–250.
del Río S, López V, Benítez JM, Herrera F. A mapreduce approach to address big data classification problems based on the fusion of linguistic fuzzy rules. International Journal of Computational Intelligence Systems 2015;8(3):422–437.
Rodríguez-Fdez I, Mucientes M, Bugarín A. FRULER: Fuzzy rule learning through evolution for regression. Inf Sci 2016;354:1–18.
Ruiz E, Casillas J. Adaptive fuzzy partitions for evolving association rules in big data stream. Int J Approx Reason 2018;93:463–486.
Sanz JA, Bernardo D, Herrera F, Bustince H, Hagras H. A compact evolutionary interval-valued fuzzy rule-based classification system for the modeling and prediction of real-world financial applications with imbalanced data. IEEE Trans Fuzzy Syst 2015;23(4):973–990.
Shvachko K, Kuang H, Radia S, Chansler R. The hadoop distributed file system. Proceedings of the 2010 IEEE 26th symposium on mass storage systems and technologies (MSST2010); 2010. p. 1–10.
sSiddique N, Adeli H. Nature inspired computing: an overview and some future directions. Cogn Comput 2015;7(6):706–714.
Storn R, Price K. 1995. Differential evolution: a simple and efficient adaptive scheme for global optimization over continuous spaces. Tech. Rep TR-95-012.
Takagi T, Sugeno M. Fuzzy identification of systems and its applications to modeling and control. IEEE Trans Syst Man Cybern 1985;15(1):116–132.
Tan PN, Kumar V, Srivastava J. Selecting the right objective measure for association analysis. Inf Syst 2004;29(4):293–313. Knowledge Discovery and Data Mining (KDD 2002).
Terlecki P, Walczak K. Efficient discovery of Top-K minimal jumping emerging patterns. Proc of the 6th international conference rough sets and current trends in computing. Berlin: Springer; 2008. p. 438–447.
Wang L, Wang Y, Zhao D. Building emerging pattern (ep) random forest for recognition. Proc of the 17th IEEE international conference on image processing; 2010. p. 1457–1460.
Wang Z, Fan H, Ramamohanarao K. Exploiting maximal emerging patterns for classification. Proc of the 17th australian joint conference on artificial intelligence, LNCS. Berlin: Springer; 2005. p. 1062–1068.
Wixom B, Ariyachandra T, Douglas DE, Goul M, Gupta B, Iyer LS, Kulkarni UR, Mooney JG, Phillips-Wren GE, Turetken O. The current state of business intelligence in academia: the arrival of big data. Commun Assoc Inf Syst 2014;34(1):1–13.
Wong ML, Leung KS. Data mining using grammar based genetic programming and applications. Dordrecht: Kluwer Academics Publishers; 2000.
Yaqoob I, Hashem IAT, Gani A, Mokhtar S, Ahmed E, Anuar NB, Vasilakos AV. Big data: from beginning to future. Int J Inf Manag 2016;36(6):1231–1247.
Yu Y, Yan K, Zhu X, Wang G. Detecting of PIU behaviors based on discovered generators and emerging patterns from Computer-Mediated interaction events. Proc of the 15th international conference on web-age information management, LNCS. Amsterdam: Elsevier; 2014. p. 277–293.
Zadeh LA. The concept of a linguistic variable and its applications to approximate reasoning. Parts I, II, III. Inf Sci 1975;8-9:199–249,301–357, 43–80.
Zadeh LA. Soft computing and fuzzy logic. IEEE Softw 1994;11(6):48–56.
Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin M, Shenker S, Stoica I. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. Proceedings of the 9th USENIX symposium on networked systems design and implementation; 2012.
Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I. Spark: Cluster computing with working sets. Proceedings of the 2nd USENIX conference on hot topics in cloud computing; 2010. p. 10–10.
Funding
This study was funded by the Spanish Ministry of Economy and Competitiveness under the project TIN2015-68454-R and FPI 2016 Scholarship reference BES-2016-077738 (FEDER Founds).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Rights and permissions
About this article
Cite this article
García-Vico, Á.M., González, P., Carmona, C.J. et al. A Big Data Approach for the Extraction of Fuzzy Emerging Patterns. Cogn Comput 11, 400–417 (2019). https://doi.org/10.1007/s12559-018-9612-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-018-9612-7