Abstract
With the volume of data growing at an unprecedented rate, knowledge acquisition for big data has become a new challenge. To address this issue, information granules in different hierarchical decision tables are constructed. The quantitative measure changes of the support, confidence and coverage associated with hierarchical decision rules are further discussed to explain these relationships between the condition granules and decision granule. Four different strategies for attribute level ascension are designed. With attribute level ascension, the number of decision rules may be reduced in most cases. An efficient parallel knowledge acquisition framework using MapReduce for big data is proposed and implemented. The experimental results demonstrate that the proposed algorithms can mine hierarchical decision rules under different levels of granularity for big data.










Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bargiela A, Pedrycz W (2008) Toward a theory of granular computing for human centered information processing. IEEE Trans Fuzzy Syst 16(2):320–330
Chen HM, Li TR, Luo C, Horng SJ, Wang GY (2014) A rough set-based method for updating decision rules on attribute values coarsening and refining. IEEE Trans Knowl Data Eng 26(12):2888–2899
Chu CT, Kim S, Lin YA, Yu YY, Bradskl G, Ng AY, et al.(2006) MapReduce for machine learning on multicore. In: Proceedings of the 20th conference on advances in neural information processing systems (NIPS2006), vol 6, pp 281–288
Cui LZ, Yu FR, Yan Q (2016) When big data meets software-defined networking: SDN for big data and big data for SDN. IEEE Netw 30(1):58–65
Dai JH, Wang WT, Xua Q, Tian HW (2012) Uncertainty measurement for interval-valued decision systems based on extended conditional entropy. Knowl Based Syst 27:443–450
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–114
Feng QR, Miao DQ, Cheng Y (2010) Hierarchical decision rules mining. Expert Syst Appl 37(3):2081–2091
Frank A, Asuncion A (2010) UCI Machine Learning Repository. University of California. School of Information and Computer Science, Irvine, 213. http://archive.ics.uci.edu/ml/
Guan YY, Wang HK, Wang Y, Yang F (2009) Attribute reduction and optimal decision rules acquisition for continuous valued information systems. Inf Sci 179:2974–2984
Han J, Fu Y (1999) Mining multiple-lvel association rules in large database. IEEE Trans Knowl Data Eng 11(5):798–805
He YL, Wang XZ, Huang JZX (2016) Fuzzy nonlinear regression analysis using a random weight network. Inf Sci 364:222–240
Hong TP, Lin CE, Lin JH, Wang SL (2008) Learning cross-level certain and possible rules by rough sets. Expert Syst Appl 34(3):1698–1706
Hu XH, Cercone N (2001) Discovering maximal generalized decision rules through horizontal and vertical data reduction. Comput Intell 17(4):685–702
Hu QH, Pedrycz W, Yu DR, Lang J (2010) Selecting discrete and continuous features based on neighborhood decision error minimization. IEEE Trans Syst Man Cybern Part B Cybern 40(1):137–150
Huang YM, Lin SH (1996) An efficient inductive learning method for object-oriented database using attribute entropy. IEEE Trans Knowl Data Eng 8(6):946–951
Huang B, Zhuang YX, Li HX (2013) Using a rough set model to extract rules in dominance-based interval-valued intuitionistic fuzzy information systems. Inf Sci 221:215–229
Jia XY, Shang L, Zhou B, Yao YY (2016) Generalized attribute reduct in rough set theory. Knowl Based Syst 91:204–218
Lai ZH, Wong WK, Xu Y, Yang J, Zhang D (2016) Approximate orthogonal sparse embedding for dimensionality reduction. IEEE Trans Neural Netw Learn Syst 27(4):723–735
Li DY, Han JW, Shi XM, Chan MC (1998) Knowledge representation and discovery based on linguistic atoms. Knowl Based Syst 10:431–440
Li HX, Wang MH, Zhou XZ, Zhao JB (2012) An interval set model for learning rules from incomplete information table. Int J Approx Reason 53(1):24–37
Li YF, Wu JT (2014) Interpretation of association rules in multi-tier structures. Int J Approx Reason 55:1439–1457
Li JH, Mei CL, Lv YJ (2013) Incomplete decision contexts: approximate concept construction, rule acquisition and knowledge reduction. Int J Approx Reason 54(1):149–165
Liang JY, Wang F, Dang CY, Qian YH (2012) An efficient rough feature selection algorithm with a multi-granulation view. Int J Approx Reason 53(6):912–926
Liu D, Li TR, Ruan D, Zou WL (2009) An incremental approach for inducing knowledge from dynamic information systems. Fundamenta Informaticae 94:245–260
Lu YJ (1997) Concept hierarchy in data mining: specification, generation and implementation. Dissertation, Simon Fraser University, Canada
Miao DQ, Wang GY, Liu Q, Lin TY, Yao YY (2007) Granular computing: past, nowday and future. Science publisher, Beijing
Miao DQ, Zhao Y, Yao YY, Li HX, Xu FF (2009) Relative reducts in consistent and inconsistent decision tables of the Pawlak rough set model. Inf Sci 179:4140–4150
Min F, Liu QH (2009) A hierarchical model for test-cost-sensitive decision systems. Inf Sci 179:2442–2452
Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11(5):341–356
Pedrycz W, Skowron A, Kreinovich V (2008) Handbook of Granular Computing. Wiley, New York
Qian YH, Liang JY, Pedrycz W, Dang CY (2010) Positive approximation: an accelerator for attribute reduction in rough set theory. Artif Intell 174(9):597–618
Qian J, Miao DQ, Zhang ZH, Li W (2011) Hybrid approaches to attribute reduction based on indiscernibility and discernibility relation. Int J Approx Reason 52:212–230
Qian J, Miao DQ, Zhang ZH, Yue XD (2014) Parallel attribute reduction algorithms using mapreduce. Inf Sci 279:671–690
Qian J, Lv P, Yue XD, Liu CH, Jing ZJ (2015) Hierarchical attribute reduction algorithms for big data using MapReduce. Knowl Based Syst 73:18–31
Shao MW, Leung Y, Wu WZ (2014) Rule acquisition and complexity reduction in formal decision contexts. Int J Approx Reason 55:259–274
She YH, Li JH, Yang HL (2015) A local approach to rule induction in multi-scale decision tables. Knowl Based Syst 89:398–410
Shi XS, Guo ZH, Lai ZH, Yang YJ, Bao ZF, Zhang D (2015) A framework of joint graph embedding and sparse regression for dimensionality reduction. IEEE Trans Image Process 24(4):1341–1355
Srinivasan A, Faruquie TA, Joshi S (2012) Data and task parallelism in ILP using MapReduce. Mach Learn 86(1):141–168
Tsumoto S (2003) Automated extraction of hierarchical decision rules from clinical databases using rough set model. Expert Syst Appl 24:189–197
Wang CZ, Wu CX, Chen DG (2008) A systematic study on attribute reduction with rough sets based on general binary relations. Inf Sci 178:2237–2261
Wang CZ, He Q, Chen DG, Hu QH (2014) A novel method for attribute reduction of covering decision systems. Inf Sci 254:181–196
Wang CZ, Shao MW, Sun BQ, Hu QH (2015) An improved attribute reduction scheme with covering based rough sets. Appl Soft Comput 26(1):235–243
Wang XZ (2015) Learning from big data with uncertainty-editorial. J Intell Fuzzy Syst 28(5):2329–2330
Wen JJ, Lai ZH, Zhan YW, Cui JR (2016) The L2, 1-norm-based unsupervised optimal feature selection with applications to action recognition. Pattern Recognit 60:515–530
Wu WZ, Leung Y (2011) Theory and applications of granular labelled partitions in multi-scale decision tables. Inf Sci 181:3878–3897
Wu WZ, Leung Y (2013) Optimal scale selection for multi-scale decision tables. Int J Approx Reason 54:1107–1129
Wu WZ, Qian YH, Li TJ, Gu SM (2017) On rule acquisition in incomplete multi-scale decision tables. Inf Sci 378:282–302
Xu WH, Zhang XY, Zhang WX (2009) Knowledge granulation, knowledge entropy and knowledge uncertainty measure in ordered information systems. Appl Soft Comput 9(4):1244–1251
Yang XB, Qi YS, Song XN, Yang JY (2013) Test cost sensitive multigranulation rough set: model and minimal cost selection. Inf Sci 250:184–199
Yao YY (2001) Information granulation and rough set approximation. Int J Intell Syst 16(1):87–104
Yao JT, Yao YY (2002) Induction of classification rules by granular computing. In: International conference on rough sets and current trends in computing (RSCTC 2002), LNCS(LNAI) 2475, pp 331–338
Yao YY, Zhao Y (2009) Discernibility matrix simplification for constructing attribute reducts. Inf Sci 7:867–882
Ye MQ, Wu XD, Hu XG, Hu DH (2014) Knowledge reduction for decision tables with attribute value taxonomies. Knowl Based Syst 56:68–78
You ZH, Yu JZ, Zhu L, Li S, Wen ZK (2014) A MapReduce based parallel SVM for large-scale predicting protein-protein interactions. Neurocomputing 145:37–43
Zadeh LA (1979) Fuzzy sets and information granularity. In: Gupta M, Ragade R, Yager R (eds) Advantages in Fuzzy set theory and applications. North-Holland, Amsterdam, pp 3–18
Zhao Y, Yao YY, Luo F (2007) Data analysis based on discernibility and indiscernibility. Inf Sci 177:4959–4976
Zhang JB, Li TR, Pan Y (2012) Parallel rough set based knowledge acquisition using MapReduce from big data. In: Proc. of the 1st international workshop on big data, streams and heterogeneous source mining: algorithms, systems, programming models and applications (BigMine 2012). ACM Press, New York, pp 20–27
Zhang X, Mei CL, Chen DG, Li JH (2013) Multi-confidence rule acquisition oriented attribute reduction of covering decision systems via combinatorial optimization. Knowl Based Syst 50:187–197
Ziarko W (2003) Acquisition of hierarchy-structured probabilistic decision tables and rules from data. Expert Syst 20(5):305–310
Acknowledgements
The research is supported by the National Natural Science Foundation of China under Grant Nos. 61573235, the Natural Science Foundation of Jiangsu Province under Grant No. BK20141152, the Humanity and Social Science Youth Foundation of Ministry of Education of China under Grant No. 15YJCZH129, Qing Lan Project of Jiangsu Province of China, Jiangsu Key Laboratory of Big Data Analysis Technology / B-DAT( Nanjing University of Information Science & Technology) under Grant No. KXK1402, the Key Laboratory of Cloud Computing and Intelligent Information Processing of Changzhou City under Grant No. CM20123004.
Author information
Authors and Affiliations
Corresponding author
Additional information
This is an extended version of the paper presented at the 2015 IEEE International Conference on Machine Learning and Cybernetics, Guangzhou, China.
Rights and permissions
About this article
Cite this article
Qian, J., Xia, M. & Yue, X. Parallel knowledge acquisition algorithms for big data using MapReduce. Int. J. Mach. Learn. & Cyber. 9, 1007–1021 (2018). https://doi.org/10.1007/s13042-016-0624-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-016-0624-x