Abstract
The recent emergence of massive amounts of data requires new algorithms that are capable of processing them in an acceptable time frame. Several proposals have been made, and all of them share the idea of using a procedure to break down the entire set of examples into smaller subsets, process each subset with a learning algorithm, and then combine the different partial results. Most of these models make use of a parallel process, where each learning algorithm learns independently for each subset of data. In our case, the goal is to propose a new model to obtain classifiers based on fuzzy rules that make use of a sequential model that can process a large number of examples and to show that, for some problems, a sequential procedure can be competitive in time and learning capacity against parallel processing proposals based on the MapReduce paradigm. This sequential processing uses a batch-incremental learning technique that can process each subset of examples. The incremental proposal makes use of a biologically inspired computation method. This method is a cognitive computational model which uses genetic algorithms to learn fuzzy rules. The experimentation carried out shows that the incremental model is competitive with respect to a parallel model proposed for addressing big data classification using fuzzy rules.
Similar content being viewed by others
References
Arnaiz-González Á , González-Rogel A, Díez-Pastor J F, López-Nozal C. Mr-dis: democratic instance selection for big data by mapreduce. Progress in Artificial Intelligence 2017;6(3):211– 19.
Bache K, Lichman M. 2013. Uci machine learning repository.
Bechini A, Marcelloni F, Segatori A. A mapreduce solution for associative classification of big data. Inform Sci 2016;332:33–55.
Chi Z, Yan H, Pham T. 1996. Fuzzy algorithms: with applications to image processing and pattern recognition, vol 10. World Scientific.
Dean J, Ghemawat S. Mapreduce: simplified data processing on large clusters. Commun ACM 2008;51(1): 107–13.
Dean J, Ghemawat S. Mapreduce: a flexible data processing tool. Commun ACM 2010;53(1):72–77.
Delgado M, González A. An inductive learning procedure to identify fuzzy systems. Fuzzy Set Syst 1993; 55:121–32.
Dragoni M, Rospocher M. Applied cognitive computing: challenges, approaches, and real-world experiences. Progress in Artificial Intelligence 2018;7(4):249–50.
Elkano M, Galar M, Sanz J, Bustince H. Chi-bd: a fuzzy rule-based classification system for big data classification problems. Fuzzy Set Syst 2018;348(1):75–101.
Elkano M, Galar M, Sanz J, Bustince H. Chi-pg: a fast prototype generation algorithm for big data classification problems. Neurocomputing 2018;287(26):22–33.
Fernández A, del Río S, Bawakid A, Herrera F. Fuzzy rule based classification systems for big data with mapreduce: granularity analysis. ADAC 2017;11(4):711–30.
Fernández A, del Río S, López V, Bawakid A, del Jesus M J, Benítez J M, Herrera F. Big data with cloud computing: an insight on the computing environment, mapreduce, and programming frameworks. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 2014;4(5):380–409.
Fisher R A. The use of multiple measurements in taxonomic problems. Annals of eugenics 1936;7(2):179–88.
Gámez J C, Garcia D, González A, Pérez R. 2016. On the use of an incremental approach to learn fuzzy classification rules for big data problems. In: 2016 IEEE international conference on fuzzy systems, FUZZ-IEEE 2016, Vancouver, BC, Canada, July 24-29, 2016, pp 1413–1420.
García D, Gámez JC, González A, Pérez R. An interpretability improvement for fuzzy rule bases obtained by the iterative rule learning approach. Int J Approx Reason 2015;67:37–58.
García D, Gámez J C, González A, Pérez R. 2015. Using a sequential covering strategy for discovering fuzzy rules incrementally. In: Proceedings of the IEEE international conference on fuzzy systems.
Gepperth A, Karaoguz C. A bio-inspired incremental learning architecture for applied perceptual problems. Cogn Comput 2016;8(5):924–34.
González A, Pérez R. Completeness and consistency conditions for learning fuzzy rules. Fuzzy Set Syst 1998;96:37–51.
González A, Pérez R. SLAVE: a genetic learning system based on an iterative approach. IEEE T Fuzzy Systems 1999;7(2):176–91.
González A, Pérez R. Selection of relevant features in a fuzzy genetic learning algorithm. IEEE transactions on systems, man, and cybernetics Part B, Cybernetics : a publication of the IEEE Systems Man, and Cybernetics Society 2001;31(3):417–25.
González A, Pérez R. Improving the genetic algorithm of slave. Mathware Soft Comput 2009;16:59–70.
Hühn J, Hüllermeier E. Furia: an algorithm for unordered fuzzy rule induction. Data Min Knowl Disc 2009;19(3):293–319.
Ishibuchi H, Yamamoto T, Nakashima T. Hybridization of fuzzy gbml approaches for pattern classification problems. IEEE Trans Syst Man Cybern Part B Cybern 2005;35(2):359–65.
Luna-Romera J M, García-Gutiérrez J, Martínez-Ballesteros M, Riquelme Santos J C. An approach to validity indices for clustering techniques in big data. Progress in Artificial Intelligence 2018;7(2):81–94.
Mahmud M, Kaiser M S, Rahman M M, Rahman M A, Shabut A, Al-Mamun S, Hussain A. 2018. A brain-inspired trust management model to assure security in a cloud based iot framework for neuroscience applications. Cognitive Computation.
Maloof M A, Michalski R S. Incremental learning with partial instance memory. Artif Intell 2004;154(1-2): 95–126.
Mansoori E G, Zolghadri M J, Katebi S D. Sgerd: A steady-state genetic algorithm for extracting fuzzy classification rules from data. IEEE T Fuzzy Systems 2008;16(4):1061–71.
Mao W, Cai Z, Yang Y, Shi X, Guan X. From big data to knowledge: A spatio-temporal approach to malware detection. Comput Secur 2018;74:167–83.
Michalski R. A theory and methodology of inductive learning symbolic computation. Berlin: Springer; 1983.
Mitchell T M. Machine learning, 1st ed. New York: McGraw-Hill, Inc; 1997.
Oneto L, Bisio F, Cambria E, Anguita D. Slt-based elm for big social data analysis. Cogn Comput 2016;9:259–74.
Oneto L, Bisio F, Cambria E, Anguita D. Semi-supervised learning for affective common-sense reasoning. Cogn Comput 2017;9(1):18–42.
Park S Y, Pan B. Identifying the next non-stop flying market with a big data approach. Tour Manage 2018; 66:411–21.
Ramírez-Gallego S, Fernández A, García S, Chen M, Herrera F. Big data: Tutorial and guidelines on information and process fusion for analytics algorithms with mapreduce. Information Fusion 2018;42: 51–61.
Read J, Bifet A, Pfahringer B, Holmes G. 2012. Batch-incremental versus instance-incremental learning in dynamic and evolving data. In: International symposium on intelligent data analysis IDA 2012: Advances in intelligent data analysis XI, pp 313–23.
del Río S, López V, Benítez JM, Herrera F. A mapreduce approach to address big data classification problems based on the fusion of linguistic fuzzy rules. Int J Comput Intell Syst 2015;8(3):422–37.
Romero-Zalíz R, González A, Pérez R. 2017. Incremental fuzzy learning algorithms in big data problems: A study on the size of learning subsets. In: Proceedings of the 2017 IEEE international conference on fuzzy systems, pp 1–6.
Segatori A, Marcelloni F, Pedrycz W. On distributed fuzzy decision trees for big data. IEEE Trans Fuzzy Syst 2018;26(1):174–92.
Shi Y, Eberhart R, Chen Y. Implementation of evolutionary fuzzy systems. IEEE Trans Fuzzy Syst 1999; 7(2):109–19.
Utgoff P. Incremental induction of decision trees. Mach Learn 1989;4(2):161–186.
Widmer G, Kubat M. Learning in the presence of concept drift and hidden contexts. Mach Learn 1996;23 (1):69–101.
Zikopulos P, Eaton C, DeRoos D, Deutsc T, Lapis G. 2012. Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data Mac Graw Hill.
Acknowledgements
The authors would like to thank the research group Soft Computing and Intelligent Information Systems (SCi2S) [http://sci2s.ugr.es] for their collaboration, permitting to us to access to the cluster for making the experimental part of this paper.
Funding
This work has been partially funded by the Spanish MEC Projects TIN2015-71618-R, DPI2015-69585-R and co-financed by FEDER funds (European Union).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical Approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
González, A., Pérez, R. & Romero-Zaliz, R. An Incremental Approach to Address Big Data Classification Problems Using Cognitive Models. Cogn Comput 11, 347–366 (2019). https://doi.org/10.1007/s12559-019-09655-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-019-09655-x