Abstract
With the increasing availability of large amount of information and the benefits related to data processing, big data have gained large significance in recent years. With scalable nature of data, big data applications are processed using MapReduce programming model. However, the application of rule-based models in datasets is not straightforward and big data are not classified in an efficient manner. To overcome the above-mentioned problems, parallel linguistic fuzzy rule with canopy MapReduce (LFR-CM) framework is introduced. LFR-CM framework classifies big data using canopy MapReduce function for information sharing in cloud with higher classification accuracy and lesser time consumption. It comprises three steps for efficient classification in cloud environment. Initially, it constructs the fuzzy knowledge base (KB) from the big data training set where linguistic fuzzy rules are constructed. The second step in LFR-CM framework has three operations. The first operation is map function used in parallel manner through every cloud user without transmitting any data to other cloud user nodes. The second operation is processing of data through the map function across all additional cloud user nodes. The third operation is reduce function deployed by each cloud user through the partitioned information. Finally, by this way, the data classification is performed with higher classification accuracy and lesser time consumption. LFR-CM framework is implemented and evaluated on Amazon EC2 cloud big data datasets and compared with the other classification system that utilizes MapReduce in terms of the runtime, classification time, classification accuracy and input/output cost. Based on the results observed from the study, LFR-CM framework is more efficient than the existing methods.
Similar content being viewed by others
Abbreviations
- \(\hbox{CS}\) :
-
Cloud servers
- \(\hbox{CU}\) :
-
Cloud users
- \(R_{i}\) :
-
Fuzzy rules
- \(P_{i}^{1}\) :
-
Antecedent fuzzy set
- \(C_{i}\) :
-
Class label
- \(\hbox{RW}_{i}\) :
-
Rule weight
- \(a_{p}\) :
-
Membership function
- \(C_{\rm mn}\) :
-
Cloud master node
- \(\hbox{MAP}\) :
-
Map function
- \(\hbox{FM}_{\text{i}}\) :
-
Mapping threshold factor
- \({\text{DS}}_{\rm{i}}\) :
-
Training set
- \({\text{CT}}\) :
-
Classification time
- \({\text{A}}_{\rm{i}}\) :
-
Classification accuracy
- \({\text{DCC}}\) :
-
Data correctly classified
- N :
-
Number of data
- \(\hbox{KB}\) :
-
Knowledge base
- n :
-
Number of instances
- \(C_{\rm wn}\) :
-
Cloud worker nodes
References
Ayma, V.A., Ferreira, R.S., Happ, P., Oliveira, D., Feitosa, R., Costa, G., Gamba, P.: Classification algorithms for big data analysis, a MapReduce approach. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci. 40(3), 17–21 (2015)
Cao, J., Cui, H., Shi, H., Jiao, L.: Big data: a parallel particle swarm optimization-back-propagation neural network algorithm based on mapreduce. PloS One 11(6), e0157551 (2015)
Chandak, M.B.: Role of big-data in classification and novel class detection in data streams. J. Big Data 3(1), 5 (2015)
Del Río, S., López, V., Benítez, J.M., Herrera, F.: On the use of MapReduce for imbalanced big data using random forest. Inf. Sci. 285, 112–137 (2014)
Gao, F., Mei, J., Sun, J., Wang, J., Yang, E., Hussain, A.: A novel classification algorithm based on incremental semi-supervised support vector machine. PloS One 10(8), e0135709 (2015)
Bhadani, A., Jothimani, D.: Big data: challenges, opportunities, and realities. Eff. Big Data Manag. Oppor. Implement. 1–24 (2017)
Ishibuchi, H., Yamamoto, T.: Rule weight specification in fuzzy rule-based classification systems. IEEE Trans. Fuzzy Syst. 13(4), 428–435 (2005)
Kamal, S., Ripon, S.H., Dey, N., Ashour, A.S., Santhi, V.: A MapReduce approach to diminish imbalance parameters for big deoxyribonucleic acid dataset. Comput. Methods Programs Biomed. 131, 191–206 (2016)
Kashyap, H., Ahmed, H.A., Hoque, N., Roy, S., Bhattacharyya, D.K.: Big data analytics in bioinformatics: a machine learning perspective. arXiv preprint arXiv:1506.05101 (2015)
Li, L., Xu, J., Xiao, W., Ge, B.: Behavior based social dimensions extraction for multi-label classification. PLoS One 11(4), e0152857 (2016)
Liu, H., Gegov, A., Stahl, F.: J-measure based hybrid pruning for complexity reduction in classification rules. WSEAS Trans. Syst. 12(9), 433–446 (2013)
Olshannikova, E., Ometov, A., Koucheryavy, Y., Olsson, T.: Visualizing big data with augmented and virtual reality: challenges and research agenda. J. Big Data 2(1), 1–27 (2015)
Peng, X., Liu, C.: Algorithms for neutrosophic soft decision making based on EDAS, new similarity measure and level soft set. J. Intell. Fuzzy Syst. 32(1), 955–968 (2017)
Peng, X., Selvachandran, G.: Pythagorean fuzzy set: state of the art and future directions. Artif. Intell. Rev. (2017). https://doi.org/10.1007/s10462-017-9596-9
Peng, X., Yang, Y.: Algorithms for interval-valued fuzzy soft sets in stochastic multi-criteria decision making based on regret theory and prospect theory with combined weight. Appl. Soft Comput. 54, 415–430 (2017)
Peng, X., Yang, Y.: Some results for pythagorean fuzzy sets. Int. J. Intell. Syst. 30(11), 1133–1160 (2015)
Pramanik, T., Samanta, S., Pal, M., Mondal, S., Sarkar, B.: Interval-valued fuzzy ϕ-tolerance competition graphs. Springer 5, 1–19 (2016)
Preoţiuc-Pietro, D., Volkova, S., Lampos, V., Bachrach, Y., Aletras, N.: Studying user income through language, behaviour and affect in social media. PLoS One 10(9), e0138717 (2015)
Rahman, M.N., Esmailpour, A.: A hybrid data center architecture for big data. Big Data Res. 3, 29–40 (2016)
Razzaghi, T., Roderick, O., Safro, I., Marko, N.: Multilevel weighted support vector machine for classification on healthcare data with missing values. PLoS ONE 11(5), e0155119 (2016)
Samanta, S., Sarkar, B.: Generalized fuzzy Euler graphs and generalized fuzzy Hamiltonian graphs. J. Intell. Fuzzy Syst. 35(3), 3413–3419 (2018)
Samanta, S., Sarkar, B.: Representation of competitions by generalized fuzzy graphs. Int. J. Comput. Intell. Syst. 11(1), 1005–1015 (2018)
Samanta, S., Pramanik, T., Sarkar, B., Pal, M.: Fuzzy φ-tolerance competition graphs. Soft. Comput. 21(13), 3723–3734 (2017)
Sarkar, B., Samanta, S.: Generalized fuzzy trees. Int. J. Comput. Intell. Syst. 10(1), 711–720 (2017)
Sarkar, B., Mahapatra, A.S.: Periodic review fuzzy inventory models with variable lead time and fuzzy demand. Int. Trans. Oper. Res. 24(5), 1197–1227 (2017)
Singh, D., Roy, D., Mohan, C.K.: DiP-SVM: distribution preserving kernel support vector machine for big data. IEEE Trans. Big Data 3(1), 79–90 (2017)
Soni, H.N., Sarkar, B., Joshi, M.: Demand uncertainty and learning in fuzziness in a continuous review inventory model. J. Intell. Fuzzy Syst. 33(4), 2595–2608 (2017)
Souliotis, K., Kani, C., Papageorgiou, M., Lionis, D., Gourgoulianis, K.: Using big data to assess prescribing patterns in Greece: the case of chronic obstructive pulmonary disease. PLoS ONE 11(5), e0154960 (2016)
Sug, H.: Applying randomness effectively based on random forests for classification task of datasets of insufficient information. J. Appl. Math. 2012, 13 (2012)
Suthaharan, S.: Machine learning models and algorithms for big data classification, vol. 36. Springer, Boston (2016)
Tcheng, D.K., Nayak, A.K., Fowlkes, C.C., Punyasena, S.W.: Visual recognition software for binary classification and its application to spruce pollen identification. PLoS ONE 11(2), e0148879 (2016)
Triguero, I., Peralta, D., Bacardit, J., García, S., Herrera, F.: MRPR: a MapReduce solution for prototype reduction in big data classification. Neurocomputing 150, 331–345 (2015)
Wu, C.J., Ku, C.F., Ho, J.M., Chen, M.S.: A novel pipeline approach for efficient big data broadcasting. IEEE Trans. Knowl. Data Eng. 28(1), 17–28 (2016)
Yang, C., Huang, Q., Li, Z., Liu, K., Hu, F.: Big data and cloud computing: innovation opportunities and challenges. Int. J. Digit. Earth 10(1), 13–53 (2017)
Yun, X., Wu, G., Zhang, G., Li, K., Wang, S.: FastRAQ: a fast approach to range-aggregate queries in big data environments. IEEE Trans. Cloud Comput. 3(2), 206–218 (2015)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Vennila, V., Kannan, A.R. Hybrid Parallel Linguistic Fuzzy Rules with Canopy MapReduce for Big Data Classification in Cloud. Int. J. Fuzzy Syst. 21, 809–822 (2019). https://doi.org/10.1007/s40815-018-0597-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40815-018-0597-x