Parallel knowledge acquisition algorithms for big data using MapReduce

Qian, Jin; Xia, Min; Yue, Xiaodong

doi:10.1007/s13042-016-0624-x

Parallel knowledge acquisition algorithms for big data using MapReduce

Original Article
Published: 01 February 2017

Volume 9, pages 1007–1021, (2018)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

360 Accesses
Explore all metrics

Abstract

With the volume of data growing at an unprecedented rate, knowledge acquisition for big data has become a new challenge. To address this issue, information granules in different hierarchical decision tables are constructed. The quantitative measure changes of the support, confidence and coverage associated with hierarchical decision rules are further discussed to explain these relationships between the condition granules and decision granule. Four different strategies for attribute level ascension are designed. With attribute level ascension, the number of decision rules may be reduced in most cases. An efficient parallel knowledge acquisition framework using MapReduce for big data is proposed and implemented. The experimental results demonstrate that the proposed algorithms can mine hierarchical decision rules under different levels of granularity for big data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Map-Reduce Based Generic Basis of Association Rules Mining from Big Bata

GPU Accelerated MapReduce-Based Distributed Framework for Knowledge Extraction from Large Uncertain Data

Article 30 November 2024

Parallel Association Rules Pruning Algorithm on Hadoop MapReduce

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Bargiela A, Pedrycz W (2008) Toward a theory of granular computing for human centered information processing. IEEE Trans Fuzzy Syst 16(2):320–330
Article Google Scholar
Chen HM, Li TR, Luo C, Horng SJ, Wang GY (2014) A rough set-based method for updating decision rules on attribute values coarsening and refining. IEEE Trans Knowl Data Eng 26(12):2888–2899
Article Google Scholar
Chu CT, Kim S, Lin YA, Yu YY, Bradskl G, Ng AY, et al.(2006) MapReduce for machine learning on multicore. In: Proceedings of the 20th conference on advances in neural information processing systems (NIPS2006), vol 6, pp 281–288
Cui LZ, Yu FR, Yan Q (2016) When big data meets software-defined networking: SDN for big data and big data for SDN. IEEE Netw 30(1):58–65
Article Google Scholar
Dai JH, Wang WT, Xua Q, Tian HW (2012) Uncertainty measurement for interval-valued decision systems based on extended conditional entropy. Knowl Based Syst 27:443–450
Article Google Scholar
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–114
Article Google Scholar
Feng QR, Miao DQ, Cheng Y (2010) Hierarchical decision rules mining. Expert Syst Appl 37(3):2081–2091
Article Google Scholar
Frank A, Asuncion A (2010) UCI Machine Learning Repository. University of California. School of Information and Computer Science, Irvine, 213. http://archive.ics.uci.edu/ml/
Guan YY, Wang HK, Wang Y, Yang F (2009) Attribute reduction and optimal decision rules acquisition for continuous valued information systems. Inf Sci 179:2974–2984
Article MathSciNet MATH Google Scholar
Han J, Fu Y (1999) Mining multiple-lvel association rules in large database. IEEE Trans Knowl Data Eng 11(5):798–805
Article Google Scholar
He YL, Wang XZ, Huang JZX (2016) Fuzzy nonlinear regression analysis using a random weight network. Inf Sci 364:222–240
Article Google Scholar
Hong TP, Lin CE, Lin JH, Wang SL (2008) Learning cross-level certain and possible rules by rough sets. Expert Syst Appl 34(3):1698–1706
Article Google Scholar
Hu XH, Cercone N (2001) Discovering maximal generalized decision rules through horizontal and vertical data reduction. Comput Intell 17(4):685–702
Article Google Scholar
Hu QH, Pedrycz W, Yu DR, Lang J (2010) Selecting discrete and continuous features based on neighborhood decision error minimization. IEEE Trans Syst Man Cybern Part B Cybern 40(1):137–150
Article Google Scholar
Huang YM, Lin SH (1996) An efficient inductive learning method for object-oriented database using attribute entropy. IEEE Trans Knowl Data Eng 8(6):946–951
Article Google Scholar
Huang B, Zhuang YX, Li HX (2013) Using a rough set model to extract rules in dominance-based interval-valued intuitionistic fuzzy information systems. Inf Sci 221:215–229
Article MathSciNet MATH Google Scholar
Jia XY, Shang L, Zhou B, Yao YY (2016) Generalized attribute reduct in rough set theory. Knowl Based Syst 91:204–218
Article Google Scholar
Lai ZH, Wong WK, Xu Y, Yang J, Zhang D (2016) Approximate orthogonal sparse embedding for dimensionality reduction. IEEE Trans Neural Netw Learn Syst 27(4):723–735
Article MathSciNet Google Scholar
Li DY, Han JW, Shi XM, Chan MC (1998) Knowledge representation and discovery based on linguistic atoms. Knowl Based Syst 10:431–440
Article Google Scholar
Li HX, Wang MH, Zhou XZ, Zhao JB (2012) An interval set model for learning rules from incomplete information table. Int J Approx Reason 53(1):24–37
Article MathSciNet MATH Google Scholar
Li YF, Wu JT (2014) Interpretation of association rules in multi-tier structures. Int J Approx Reason 55:1439–1457
Article MathSciNet MATH Google Scholar
Li JH, Mei CL, Lv YJ (2013) Incomplete decision contexts: approximate concept construction, rule acquisition and knowledge reduction. Int J Approx Reason 54(1):149–165
Article MathSciNet MATH Google Scholar
Liang JY, Wang F, Dang CY, Qian YH (2012) An efficient rough feature selection algorithm with a multi-granulation view. Int J Approx Reason 53(6):912–926
Article MathSciNet Google Scholar
Liu D, Li TR, Ruan D, Zou WL (2009) An incremental approach for inducing knowledge from dynamic information systems. Fundamenta Informaticae 94:245–260
MathSciNet MATH Google Scholar
Lu YJ (1997) Concept hierarchy in data mining: specification, generation and implementation. Dissertation, Simon Fraser University, Canada
Miao DQ, Wang GY, Liu Q, Lin TY, Yao YY (2007) Granular computing: past, nowday and future. Science publisher, Beijing
Google Scholar
Miao DQ, Zhao Y, Yao YY, Li HX, Xu FF (2009) Relative reducts in consistent and inconsistent decision tables of the Pawlak rough set model. Inf Sci 179:4140–4150
Article MathSciNet MATH Google Scholar
Min F, Liu QH (2009) A hierarchical model for test-cost-sensitive decision systems. Inf Sci 179:2442–2452
Article MathSciNet MATH Google Scholar
Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11(5):341–356
Article MATH Google Scholar
Pedrycz W, Skowron A, Kreinovich V (2008) Handbook of Granular Computing. Wiley, New York
Book Google Scholar
Qian YH, Liang JY, Pedrycz W, Dang CY (2010) Positive approximation: an accelerator for attribute reduction in rough set theory. Artif Intell 174(9):597–618
Article MathSciNet MATH Google Scholar
Qian J, Miao DQ, Zhang ZH, Li W (2011) Hybrid approaches to attribute reduction based on indiscernibility and discernibility relation. Int J Approx Reason 52:212–230
Article MathSciNet MATH Google Scholar
Qian J, Miao DQ, Zhang ZH, Yue XD (2014) Parallel attribute reduction algorithms using mapreduce. Inf Sci 279:671–690
Article MathSciNet MATH Google Scholar
Qian J, Lv P, Yue XD, Liu CH, Jing ZJ (2015) Hierarchical attribute reduction algorithms for big data using MapReduce. Knowl Based Syst 73:18–31
Article Google Scholar
Shao MW, Leung Y, Wu WZ (2014) Rule acquisition and complexity reduction in formal decision contexts. Int J Approx Reason 55:259–274
Article MathSciNet MATH Google Scholar
She YH, Li JH, Yang HL (2015) A local approach to rule induction in multi-scale decision tables. Knowl Based Syst 89:398–410
Article Google Scholar
Shi XS, Guo ZH, Lai ZH, Yang YJ, Bao ZF, Zhang D (2015) A framework of joint graph embedding and sparse regression for dimensionality reduction. IEEE Trans Image Process 24(4):1341–1355
Article MathSciNet Google Scholar
Srinivasan A, Faruquie TA, Joshi S (2012) Data and task parallelism in ILP using MapReduce. Mach Learn 86(1):141–168
Article MathSciNet MATH Google Scholar
Tsumoto S (2003) Automated extraction of hierarchical decision rules from clinical databases using rough set model. Expert Syst Appl 24:189–197
Article Google Scholar
Wang CZ, Wu CX, Chen DG (2008) A systematic study on attribute reduction with rough sets based on general binary relations. Inf Sci 178:2237–2261
Article MathSciNet MATH Google Scholar
Wang CZ, He Q, Chen DG, Hu QH (2014) A novel method for attribute reduction of covering decision systems. Inf Sci 254:181–196
Article MathSciNet MATH Google Scholar
Wang CZ, Shao MW, Sun BQ, Hu QH (2015) An improved attribute reduction scheme with covering based rough sets. Appl Soft Comput 26(1):235–243
Article Google Scholar
Wang XZ (2015) Learning from big data with uncertainty-editorial. J Intell Fuzzy Syst 28(5):2329–2330
Article MathSciNet Google Scholar
Wen JJ, Lai ZH, Zhan YW, Cui JR (2016) The L2, 1-norm-based unsupervised optimal feature selection with applications to action recognition. Pattern Recognit 60:515–530
Article Google Scholar
Wu WZ, Leung Y (2011) Theory and applications of granular labelled partitions in multi-scale decision tables. Inf Sci 181:3878–3897
Article MATH Google Scholar
Wu WZ, Leung Y (2013) Optimal scale selection for multi-scale decision tables. Int J Approx Reason 54:1107–1129
Article MathSciNet MATH Google Scholar
Wu WZ, Qian YH, Li TJ, Gu SM (2017) On rule acquisition in incomplete multi-scale decision tables. Inf Sci 378:282–302
Article MathSciNet Google Scholar
Xu WH, Zhang XY, Zhang WX (2009) Knowledge granulation, knowledge entropy and knowledge uncertainty measure in ordered information systems. Appl Soft Comput 9(4):1244–1251
Article Google Scholar
Yang XB, Qi YS, Song XN, Yang JY (2013) Test cost sensitive multigranulation rough set: model and minimal cost selection. Inf Sci 250:184–199
Article MathSciNet MATH Google Scholar
Yao YY (2001) Information granulation and rough set approximation. Int J Intell Syst 16(1):87–104
Article MathSciNet MATH Google Scholar
Yao JT, Yao YY (2002) Induction of classification rules by granular computing. In: International conference on rough sets and current trends in computing (RSCTC 2002), LNCS(LNAI) 2475, pp 331–338
Yao YY, Zhao Y (2009) Discernibility matrix simplification for constructing attribute reducts. Inf Sci 7:867–882
Article MathSciNet MATH Google Scholar
Ye MQ, Wu XD, Hu XG, Hu DH (2014) Knowledge reduction for decision tables with attribute value taxonomies. Knowl Based Syst 56:68–78
Article Google Scholar
You ZH, Yu JZ, Zhu L, Li S, Wen ZK (2014) A MapReduce based parallel SVM for large-scale predicting protein-protein interactions. Neurocomputing 145:37–43
Article Google Scholar
Zadeh LA (1979) Fuzzy sets and information granularity. In: Gupta M, Ragade R, Yager R (eds) Advantages in Fuzzy set theory and applications. North-Holland, Amsterdam, pp 3–18
Google Scholar
Zhao Y, Yao YY, Luo F (2007) Data analysis based on discernibility and indiscernibility. Inf Sci 177:4959–4976
Article MATH Google Scholar
Zhang JB, Li TR, Pan Y (2012) Parallel rough set based knowledge acquisition using MapReduce from big data. In: Proc. of the 1st international workshop on big data, streams and heterogeneous source mining: algorithms, systems, programming models and applications (BigMine 2012). ACM Press, New York, pp 20–27
Zhang X, Mei CL, Chen DG, Li JH (2013) Multi-confidence rule acquisition oriented attribute reduction of covering decision systems via combinatorial optimization. Knowl Based Syst 50:187–197
Article Google Scholar
Ziarko W (2003) Acquisition of hierarchy-structured probabilistic decision tables and rules from data. Expert Syst 20(5):305–310
Article Google Scholar

Download references

Acknowledgements

The research is supported by the National Natural Science Foundation of China under Grant Nos. 61573235, the Natural Science Foundation of Jiangsu Province under Grant No. BK20141152, the Humanity and Social Science Youth Foundation of Ministry of Education of China under Grant No. 15YJCZH129, Qing Lan Project of Jiangsu Province of China, Jiangsu Key Laboratory of Big Data Analysis Technology / B-DAT( Nanjing University of Information Science & Technology) under Grant No. KXK1402, the Key Laboratory of Cloud Computing and Intelligent Information Processing of Changzhou City under Grant No. CM20123004.

Author information

Authors and Affiliations

School of Computer Engineering, Jiangsu University of Technology, Changzhou, 213001, China
Jin Qian
Jiangsu Key Laboratory of Big Data Analysis Technology/B-DAT, Nanjing University of Information Science and Technology, Nanjing, 210044, China
Jin Qian & Min Xia
School of Computer Engineering and Science, Shanghai University, Shanghai, 200444, China
Xiaodong Yue

Authors

Jin Qian
View author publications
You can also search for this author in PubMed Google Scholar
Min Xia
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodong Yue
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jin Qian.

Additional information

This is an extended version of the paper presented at the 2015 IEEE International Conference on Machine Learning and Cybernetics, Guangzhou, China.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qian, J., Xia, M. & Yue, X. Parallel knowledge acquisition algorithms for big data using MapReduce. Int. J. Mach. Learn. & Cyber. 9, 1007–1021 (2018). https://doi.org/10.1007/s13042-016-0624-x

Download citation

Received: 16 February 2016
Accepted: 07 December 2016
Published: 01 February 2017
Issue Date: June 2018
DOI: https://doi.org/10.1007/s13042-016-0624-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallel knowledge acquisition algorithms for big data using MapReduce

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Map-Reduce Based Generic Basis of Association Rules Mining from Big Bata

GPU Accelerated MapReduce-Based Distributed Framework for Knowledge Extraction from Large Uncertain Data

Parallel Association Rules Pruning Algorithm on Hadoop MapReduce

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now