Parallel Attribute Reduction Based on MapReduce

Xi, Dachao; Wang, Guoyin; Zhang, Xuerui; Zhang, Fan

doi:10.1007/978-3-319-11740-9_58

Dachao Xi^10,11,
Guoyin Wang¹¹,
Xuerui Zhang¹¹ &
…
Fan Zhang¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8818))

Included in the following conference series:

International Conference on Rough Sets and Knowledge Technology

3837 Accesses
1 Citations

Abstract

With the explosive increment of data, varieties of the parallel attribute reduction algorithm have been studied. To promote its efficiency, this paper proposes a new parallel attribute reduction algorithm based on MapReduce. It contains three parts, parallel computation of a simplified decision table, parallel computation of attribute significance and parallel computation of decision table. Data with different sizes are experimented. The experimental result shows that our algorithm has the ability of processing massive data with efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. ACM SIGOPS Operating Systems Review 37(5), 29–43 (2003)
Article Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, OSDI 2004, vol. 6, p. 10. USENIX Association, Berkeley (2004)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)
Article Google Scholar
Chang, F., Dean, J., Ghemawat, S., et al.: Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS) 26(2), 4 (2008)
Article Google Scholar
Pawlak, Z.: Rough set. International Journal of Computer and Information Sciences 11, 341–356 (1982)
Article MathSciNet MATH Google Scholar
Zhang, J., Li, T., Ruan, D., et al.: A parallel method for computing rough set approximations. Information Sciences 194, 209–223 (2012)
Article Google Scholar
Zhang, J., Wong, J., Li, T., Li, P.Y.: A comparison of parallel large-scale knowledge acquisition using rough set theory on different MapReduce runtime systems. International Journal of Approximate Reasoning (2013)
Google Scholar
Qian, J., Miao, D.Q., Zhang, Z.H.: Knowledge reduction algorithms in cloud computing. Jisuanji Xuebao (Chinese Journal of Computers) 34(12), 2332–2343 (2011)
Google Scholar
Wang, G.: Rough Set Theory and knowledge Acquisition. Jiaotong University Press, Xi’an (2001) (in Chinese)
Google Scholar
White, T.: Hadoop: The definitive guide. O’Reilly Media, Inc. (2012)
Google Scholar
Zhangyan, X., Zuopeng, L., Bingru, Y., et al.: A quick attribute reduction algorithm with complexity of max {O (| C|| U|), O (| C| 2| U/C|)}. Chinese Journal of Computers 29(3), 391–399 (2006)
Google Scholar
Qian, J., Miao, D.Q., Zhang, Z.H.: Research on Discernibility Matrix Knowledge Reduction Algorithm in Cloud Computing. Computer Science 38(8), 193 (2011)
Google Scholar
Yang, Y., Chen, Z., Liang, Z., Wang, G.: Attribute reduction for massive data based on rough set theory and mapReduce. In: Yu, J., Greco, S., Lingras, P., Wang, G., Skowron, A. (eds.) RSKT 2010. LNCS, vol. 6401, pp. 672–678. Springer, Heidelberg (2010)
Chapter Google Scholar
Yang, Y., Chen, Z.: Parallelized computing of attribute core based on rough set theory and mapReduce. In: Li, T., Nguyen, H.S., Wang, G., Grzymala-Busse, J., Janicki, R., Hassanien, A.E., Yu, H. (eds.) RSKT 2012. LNCS, vol. 7414, pp. 155–160. Springer, Heidelberg (2012)
Chapter Google Scholar
Qian, J., Miao, D., Zhang, Z., et al.: Parallel attribute reduction algorithms using MapReduce. Information Sciences (2014)
Google Scholar
Hadoop project develops open-source software for reliable, scalable, distribute computing, http://hadoop.apache.org
Newman, D., Hettich, S., Blake, C., Merz, C.: UCI Repository of Machine Learning Databases, University of California, Department of Information andComputer Science, Irvine, CA (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html

Download references

Author information

Authors and Affiliations

Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
Dachao Xi
Institute of Electronic Information & Technology, Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing, 401122, China
Dachao Xi, Guoyin Wang, Xuerui Zhang & Fan Zhang

Authors

Dachao Xi
View author publications
You can also search for this author in PubMed Google Scholar
Guoyin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xuerui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Fan Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Tongji University, Shanghai, China
Duoqian Miao
Department of Electrical and Computer En, University of Alberta, Edmonton, Alberta, Canada
Witold Pedrycz
University of Warsaw, Warsaw, Poland
Dominik Ślȩzak
University of Applied Sciences, München, Germany
Georg Peters
Tianjin University, Tianjin, China
Qinghua Hu
Tongji University, Shanghai, China
Ruizhi Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xi, D., Wang, G., Zhang, X., Zhang, F. (2014). Parallel Attribute Reduction Based on MapReduce. In: Miao, D., Pedrycz, W., Ślȩzak, D., Peters, G., Hu, Q., Wang, R. (eds) Rough Sets and Knowledge Technology. RSKT 2014. Lecture Notes in Computer Science(), vol 8818. Springer, Cham. https://doi.org/10.1007/978-3-319-11740-9_58

Download citation

DOI: https://doi.org/10.1007/978-3-319-11740-9_58
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11739-3
Online ISBN: 978-3-319-11740-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics