Skip to main content
Log in

Parallel knowledge acquisition algorithms for big data using MapReduce

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

With the volume of data growing at an unprecedented rate, knowledge acquisition for big data has become a new challenge. To address this issue, information granules in different hierarchical decision tables are constructed. The quantitative measure changes of the support, confidence and coverage associated with hierarchical decision rules are further discussed to explain these relationships between the condition granules and decision granule. Four different strategies for attribute level ascension are designed. With attribute level ascension, the number of decision rules may be reduced in most cases. An efficient parallel knowledge acquisition framework using MapReduce for big data is proposed and implemented. The experimental results demonstrate that the proposed algorithms can mine hierarchical decision rules under different levels of granularity for big data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Bargiela A, Pedrycz W (2008) Toward a theory of granular computing for human centered information processing. IEEE Trans Fuzzy Syst 16(2):320–330

    Article  Google Scholar 

  2. Chen HM, Li TR, Luo C, Horng SJ, Wang GY (2014) A rough set-based method for updating decision rules on attribute values coarsening and refining. IEEE Trans Knowl Data Eng 26(12):2888–2899

    Article  Google Scholar 

  3. Chu CT, Kim S, Lin YA, Yu YY, Bradskl G, Ng AY, et al.(2006) MapReduce for machine learning on multicore. In: Proceedings of the 20th conference on advances in neural information processing systems (NIPS2006), vol 6, pp 281–288

  4. Cui LZ, Yu FR, Yan Q (2016) When big data meets software-defined networking: SDN for big data and big data for SDN. IEEE Netw 30(1):58–65

    Article  Google Scholar 

  5. Dai JH, Wang WT, Xua Q, Tian HW (2012) Uncertainty measurement for interval-valued decision systems based on extended conditional entropy. Knowl Based Syst 27:443–450

    Article  Google Scholar 

  6. Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–114

    Article  Google Scholar 

  7. Feng QR, Miao DQ, Cheng Y (2010) Hierarchical decision rules mining. Expert Syst Appl 37(3):2081–2091

    Article  Google Scholar 

  8. Frank A, Asuncion A (2010) UCI Machine Learning Repository. University of California. School of Information and Computer Science, Irvine, 213. http://archive.ics.uci.edu/ml/

  9. Guan YY, Wang HK, Wang Y, Yang F (2009) Attribute reduction and optimal decision rules acquisition for continuous valued information systems. Inf Sci 179:2974–2984

    Article  MathSciNet  MATH  Google Scholar 

  10. Han J, Fu Y (1999) Mining multiple-lvel association rules in large database. IEEE Trans Knowl Data Eng 11(5):798–805

    Article  Google Scholar 

  11. He YL, Wang XZ, Huang JZX (2016) Fuzzy nonlinear regression analysis using a random weight network. Inf Sci 364:222–240

    Article  Google Scholar 

  12. Hong TP, Lin CE, Lin JH, Wang SL (2008) Learning cross-level certain and possible rules by rough sets. Expert Syst Appl 34(3):1698–1706

    Article  Google Scholar 

  13. Hu XH, Cercone N (2001) Discovering maximal generalized decision rules through horizontal and vertical data reduction. Comput Intell 17(4):685–702

    Article  Google Scholar 

  14. Hu QH, Pedrycz W, Yu DR, Lang J (2010) Selecting discrete and continuous features based on neighborhood decision error minimization. IEEE Trans Syst Man Cybern Part B Cybern 40(1):137–150

    Article  Google Scholar 

  15. Huang YM, Lin SH (1996) An efficient inductive learning method for object-oriented database using attribute entropy. IEEE Trans Knowl Data Eng 8(6):946–951

    Article  Google Scholar 

  16. Huang B, Zhuang YX, Li HX (2013) Using a rough set model to extract rules in dominance-based interval-valued intuitionistic fuzzy information systems. Inf Sci 221:215–229

    Article  MathSciNet  MATH  Google Scholar 

  17. Jia XY, Shang L, Zhou B, Yao YY (2016) Generalized attribute reduct in rough set theory. Knowl Based Syst 91:204–218

    Article  Google Scholar 

  18. Lai ZH, Wong WK, Xu Y, Yang J, Zhang D (2016) Approximate orthogonal sparse embedding for dimensionality reduction. IEEE Trans Neural Netw Learn Syst 27(4):723–735

    Article  MathSciNet  Google Scholar 

  19. Li DY, Han JW, Shi XM, Chan MC (1998) Knowledge representation and discovery based on linguistic atoms. Knowl Based Syst 10:431–440

    Article  Google Scholar 

  20. Li HX, Wang MH, Zhou XZ, Zhao JB (2012) An interval set model for learning rules from incomplete information table. Int J Approx Reason 53(1):24–37

    Article  MathSciNet  MATH  Google Scholar 

  21. Li YF, Wu JT (2014) Interpretation of association rules in multi-tier structures. Int J Approx Reason 55:1439–1457

    Article  MathSciNet  MATH  Google Scholar 

  22. Li JH, Mei CL, Lv YJ (2013) Incomplete decision contexts: approximate concept construction, rule acquisition and knowledge reduction. Int J Approx Reason 54(1):149–165

    Article  MathSciNet  MATH  Google Scholar 

  23. Liang JY, Wang F, Dang CY, Qian YH (2012) An efficient rough feature selection algorithm with a multi-granulation view. Int J Approx Reason 53(6):912–926

    Article  MathSciNet  Google Scholar 

  24. Liu D, Li TR, Ruan D, Zou WL (2009) An incremental approach for inducing knowledge from dynamic information systems. Fundamenta Informaticae 94:245–260

    MathSciNet  MATH  Google Scholar 

  25. Lu YJ (1997) Concept hierarchy in data mining: specification, generation and implementation. Dissertation, Simon Fraser University, Canada

  26. Miao DQ, Wang GY, Liu Q, Lin TY, Yao YY (2007) Granular computing: past, nowday and future. Science publisher, Beijing

    Google Scholar 

  27. Miao DQ, Zhao Y, Yao YY, Li HX, Xu FF (2009) Relative reducts in consistent and inconsistent decision tables of the Pawlak rough set model. Inf Sci 179:4140–4150

    Article  MathSciNet  MATH  Google Scholar 

  28. Min F, Liu QH (2009) A hierarchical model for test-cost-sensitive decision systems. Inf Sci 179:2442–2452

    Article  MathSciNet  MATH  Google Scholar 

  29. Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11(5):341–356

    Article  MATH  Google Scholar 

  30. Pedrycz W, Skowron A, Kreinovich V (2008) Handbook of Granular Computing. Wiley, New York

    Book  Google Scholar 

  31. Qian YH, Liang JY, Pedrycz W, Dang CY (2010) Positive approximation: an accelerator for attribute reduction in rough set theory. Artif Intell 174(9):597–618

    Article  MathSciNet  MATH  Google Scholar 

  32. Qian J, Miao DQ, Zhang ZH, Li W (2011) Hybrid approaches to attribute reduction based on indiscernibility and discernibility relation. Int J Approx Reason 52:212–230

    Article  MathSciNet  MATH  Google Scholar 

  33. Qian J, Miao DQ, Zhang ZH, Yue XD (2014) Parallel attribute reduction algorithms using mapreduce. Inf Sci 279:671–690

    Article  MathSciNet  MATH  Google Scholar 

  34. Qian J, Lv P, Yue XD, Liu CH, Jing ZJ (2015) Hierarchical attribute reduction algorithms for big data using MapReduce. Knowl Based Syst 73:18–31

    Article  Google Scholar 

  35. Shao MW, Leung Y, Wu WZ (2014) Rule acquisition and complexity reduction in formal decision contexts. Int J Approx Reason 55:259–274

    Article  MathSciNet  MATH  Google Scholar 

  36. She YH, Li JH, Yang HL (2015) A local approach to rule induction in multi-scale decision tables. Knowl Based Syst 89:398–410

    Article  Google Scholar 

  37. Shi XS, Guo ZH, Lai ZH, Yang YJ, Bao ZF, Zhang D (2015) A framework of joint graph embedding and sparse regression for dimensionality reduction. IEEE Trans Image Process 24(4):1341–1355

    Article  MathSciNet  Google Scholar 

  38. Srinivasan A, Faruquie TA, Joshi S (2012) Data and task parallelism in ILP using MapReduce. Mach Learn 86(1):141–168

    Article  MathSciNet  MATH  Google Scholar 

  39. Tsumoto S (2003) Automated extraction of hierarchical decision rules from clinical databases using rough set model. Expert Syst Appl 24:189–197

    Article  Google Scholar 

  40. Wang CZ, Wu CX, Chen DG (2008) A systematic study on attribute reduction with rough sets based on general binary relations. Inf Sci 178:2237–2261

    Article  MathSciNet  MATH  Google Scholar 

  41. Wang CZ, He Q, Chen DG, Hu QH (2014) A novel method for attribute reduction of covering decision systems. Inf Sci 254:181–196

    Article  MathSciNet  MATH  Google Scholar 

  42. Wang CZ, Shao MW, Sun BQ, Hu QH (2015) An improved attribute reduction scheme with covering based rough sets. Appl Soft Comput 26(1):235–243

    Article  Google Scholar 

  43. Wang XZ (2015) Learning from big data with uncertainty-editorial. J Intell Fuzzy Syst 28(5):2329–2330

    Article  MathSciNet  Google Scholar 

  44. Wen JJ, Lai ZH, Zhan YW, Cui JR (2016) The L2, 1-norm-based unsupervised optimal feature selection with applications to action recognition. Pattern Recognit 60:515–530

    Article  Google Scholar 

  45. Wu WZ, Leung Y (2011) Theory and applications of granular labelled partitions in multi-scale decision tables. Inf Sci 181:3878–3897

    Article  MATH  Google Scholar 

  46. Wu WZ, Leung Y (2013) Optimal scale selection for multi-scale decision tables. Int J Approx Reason 54:1107–1129

    Article  MathSciNet  MATH  Google Scholar 

  47. Wu WZ, Qian YH, Li TJ, Gu SM (2017) On rule acquisition in incomplete multi-scale decision tables. Inf Sci 378:282–302

    Article  MathSciNet  Google Scholar 

  48. Xu WH, Zhang XY, Zhang WX (2009) Knowledge granulation, knowledge entropy and knowledge uncertainty measure in ordered information systems. Appl Soft Comput 9(4):1244–1251

    Article  Google Scholar 

  49. Yang XB, Qi YS, Song XN, Yang JY (2013) Test cost sensitive multigranulation rough set: model and minimal cost selection. Inf Sci 250:184–199

    Article  MathSciNet  MATH  Google Scholar 

  50. Yao YY (2001) Information granulation and rough set approximation. Int J Intell Syst 16(1):87–104

    Article  MathSciNet  MATH  Google Scholar 

  51. Yao JT, Yao YY (2002) Induction of classification rules by granular computing. In: International conference on rough sets and current trends in computing (RSCTC 2002), LNCS(LNAI) 2475, pp 331–338

  52. Yao YY, Zhao Y (2009) Discernibility matrix simplification for constructing attribute reducts. Inf Sci 7:867–882

    Article  MathSciNet  MATH  Google Scholar 

  53. Ye MQ, Wu XD, Hu XG, Hu DH (2014) Knowledge reduction for decision tables with attribute value taxonomies. Knowl Based Syst 56:68–78

    Article  Google Scholar 

  54. You ZH, Yu JZ, Zhu L, Li S, Wen ZK (2014) A MapReduce based parallel SVM for large-scale predicting protein-protein interactions. Neurocomputing 145:37–43

    Article  Google Scholar 

  55. Zadeh LA (1979) Fuzzy sets and information granularity. In: Gupta M, Ragade R, Yager R (eds) Advantages in Fuzzy set theory and applications. North-Holland, Amsterdam, pp 3–18

    Google Scholar 

  56. Zhao Y, Yao YY, Luo F (2007) Data analysis based on discernibility and indiscernibility. Inf Sci 177:4959–4976

    Article  MATH  Google Scholar 

  57. Zhang JB, Li TR, Pan Y (2012) Parallel rough set based knowledge acquisition using MapReduce from big data. In: Proc. of the 1st international workshop on big data, streams and heterogeneous source mining: algorithms, systems, programming models and applications (BigMine 2012). ACM Press, New York, pp 20–27

  58. Zhang X, Mei CL, Chen DG, Li JH (2013) Multi-confidence rule acquisition oriented attribute reduction of covering decision systems via combinatorial optimization. Knowl Based Syst 50:187–197

    Article  Google Scholar 

  59. Ziarko W (2003) Acquisition of hierarchy-structured probabilistic decision tables and rules from data. Expert Syst 20(5):305–310

    Article  Google Scholar 

Download references

Acknowledgements

The research is supported by the National Natural Science Foundation of China under Grant Nos. 61573235, the Natural Science Foundation of Jiangsu Province under Grant No. BK20141152, the Humanity and Social Science Youth Foundation of Ministry of Education of China under Grant No. 15YJCZH129, Qing Lan Project of Jiangsu Province of China, Jiangsu Key Laboratory of Big Data Analysis Technology / B-DAT( Nanjing University of Information Science & Technology) under Grant No. KXK1402, the Key Laboratory of Cloud Computing and Intelligent Information Processing of Changzhou City under Grant No. CM20123004.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jin Qian.

Additional information

This is an extended version of the paper presented at the 2015 IEEE International Conference on Machine Learning and Cybernetics, Guangzhou, China.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qian, J., Xia, M. & Yue, X. Parallel knowledge acquisition algorithms for big data using MapReduce. Int. J. Mach. Learn. & Cyber. 9, 1007–1021 (2018). https://doi.org/10.1007/s13042-016-0624-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-016-0624-x

Keywords

Navigation