Abstract
With the explosive increment of data, varieties of the parallel attribute reduction algorithm have been studied. To promote its efficiency, this paper proposes a new parallel attribute reduction algorithm based on MapReduce. It contains three parts, parallel computation of a simplified decision table, parallel computation of attribute significance and parallel computation of decision table. Data with different sizes are experimented. The experimental result shows that our algorithm has the ability of processing massive data with efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. ACM SIGOPS Operating Systems Review 37(5), 29–43 (2003)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, OSDI 2004, vol. 6, p. 10. USENIX Association, Berkeley (2004)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)
Chang, F., Dean, J., Ghemawat, S., et al.: Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS)Â 26(2), 4 (2008)
Pawlak, Z.: Rough set. International Journal of Computer and Information Sciences 11, 341–356 (1982)
Zhang, J., Li, T., Ruan, D., et al.: A parallel method for computing rough set approximations. Information Sciences 194, 209–223 (2012)
Zhang, J., Wong, J., Li, T., Li, P.Y.: A comparison of parallel large-scale knowledge acquisition using rough set theory on different MapReduce runtime systems. International Journal of Approximate Reasoning (2013)
Qian, J., Miao, D.Q., Zhang, Z.H.: Knowledge reduction algorithms in cloud computing. Jisuanji Xuebao (Chinese Journal of Computers) 34(12), 2332–2343 (2011)
Wang, G.: Rough Set Theory and knowledge Acquisition. Jiaotong University Press, Xi’an (2001) (in Chinese)
White, T.: Hadoop: The definitive guide. O’Reilly Media, Inc. (2012)
Zhangyan, X., Zuopeng, L., Bingru, Y., et al.: A quick attribute reduction algorithm with complexity of max {O (| C|| U|), O (| C| 2| U/C|)}. Chinese Journal of Computers 29(3), 391–399 (2006)
Qian, J., Miao, D.Q., Zhang, Z.H.: Research on Discernibility Matrix Knowledge Reduction Algorithm in Cloud Computing. Computer Science 38(8), 193 (2011)
Yang, Y., Chen, Z., Liang, Z., Wang, G.: Attribute reduction for massive data based on rough set theory and mapReduce. In: Yu, J., Greco, S., Lingras, P., Wang, G., Skowron, A. (eds.) RSKT 2010. LNCS, vol. 6401, pp. 672–678. Springer, Heidelberg (2010)
Yang, Y., Chen, Z.: Parallelized computing of attribute core based on rough set theory and mapReduce. In: Li, T., Nguyen, H.S., Wang, G., Grzymala-Busse, J., Janicki, R., Hassanien, A.E., Yu, H. (eds.) RSKT 2012. LNCS, vol. 7414, pp. 155–160. Springer, Heidelberg (2012)
Qian, J., Miao, D., Zhang, Z., et al.: Parallel attribute reduction algorithms using MapReduce. Information Sciences (2014)
Hadoop project develops open-source software for reliable, scalable, distribute computing, http://hadoop.apache.org
Newman, D., Hettich, S., Blake, C., Merz, C.: UCI Repository of Machine Learning Databases, University of California, Department of Information andComputer Science, Irvine, CA (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Xi, D., Wang, G., Zhang, X., Zhang, F. (2014). Parallel Attribute Reduction Based on MapReduce. In: Miao, D., Pedrycz, W., Ślȩzak, D., Peters, G., Hu, Q., Wang, R. (eds) Rough Sets and Knowledge Technology. RSKT 2014. Lecture Notes in Computer Science(), vol 8818. Springer, Cham. https://doi.org/10.1007/978-3-319-11740-9_58
Download citation
DOI: https://doi.org/10.1007/978-3-319-11740-9_58
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11739-3
Online ISBN: 978-3-319-11740-9
eBook Packages: Computer ScienceComputer Science (R0)