Skip to main content

Scalable Implementations of Rough Set Algorithms: A Survey

  • Conference paper
  • First Online:
Recent Trends and Future Technology in Applied Intelligence (IEA/AIE 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10868))

Abstract

With the rapid change of volume, variety, and velocity of data across real-life domains, learning from big data has become a growing challenge. Rough set theory has been successfully applied to knowledge discovery from databases (KDD) for handling data with imperfections. Most traditional rough set algorithms were implemented in a sequential manner and ran on a single machine, becoming computationally expensive and inefficient for handling massive data. Recent computing frameworks, such as MapReduce and Apache Spark, made it possible to realize parallel rough set algorithms on distributed clusters of commodity computers and speed up big data analyses. Although a variety of scalable rough set implementations have been developed, (1) most proposed research compared their work with outdated sequential implementations; (2) certain distributed computing frameworks were used more frequently, overlooking recently developed frameworks; and (3) existing issues and guidance in adapting new computing frameworks are lacking. The main objective of this paper is to provide current state-of-the-art scalable implementations of rough set algorithms. This paper will help researchers catch up with the recent developments in this field and further provide some insights to develop rough set algorithms in up-to-date high performance computing environments for big data analytics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Pawlak, Z.: Rough Sets, Theoretical Aspects of Reasoning About Data. Kluwer Academic Publishers, Dordrecht (1991)

    MATH  Google Scholar 

  2. Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)

    MATH  Google Scholar 

  3. Zadeh, L.A.: Fuzzy sets. Inf. Control 8(3), 338–353 (1965)

    Article  Google Scholar 

  4. Zadeh, L.: Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Syst. 1(1), 3–28 (1978)

    Article  MathSciNet  Google Scholar 

  5. Shafer, G.: A Mathematical Theory of Evidence. Princeton University Press, Princeton (1976)

    MATH  Google Scholar 

  6. Hasan, A., Srinivasan, R., Vasudevan, G., Verbiest, N., Cornelis, C., Tolentino, M.E., Teredesai, A., Cock, M.D.: Computing fuzzy rough approximations in large scale information systems. In: BigData Conference, pp. 9–16 (2014)

    Google Scholar 

  7. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  8. Zaharia, M., Xin, R.S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M.J.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)

    Article  Google Scholar 

  9. Apache Flink: Scalable stream and batch data processing. https://flink.apache.org/

  10. Apache Storm. http://storm.apache.org/

  11. Samza. http://samza.apache.org/

  12. Pawlak, Z.: Rough set approach to knowledge-based decision support. Eur. J. Oper. Res. 99(1), 48–57 (1997)

    Article  MathSciNet  Google Scholar 

  13. Jadhav, S., Suryawanshi, S.: A survey on parallel rough set based knowledge acquisition using MapReduce from big data (2014)

    Google Scholar 

  14. Nandgaonkar, Suruchi, V., Raut, A.B.: A survey on parallel method for rough set using MapReduce technique for data mining. Int. J. Eng. Comput. Sci. (2015)

    Google Scholar 

  15. Li, T., Luo, C., Chen, H., Zhang, J.: PICKT: a solution for big data analysis. In: Ciucci, D., Wang, G., Mitra, S., Wu, W.-Z. (eds.) RSKT 2015. LNCS (LNAI), vol. 9436, pp. 15–25. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25754-9_2

    Chapter  Google Scholar 

  16. Zhang, J., Li, T., Pan, Y.: PLAR: parallel large-scale attribute reduction on cloud systems. In: PDCAT, pp. 184–191 (2013)

    Google Scholar 

  17. Li, S.Y., Li, T.R., Zhang, Z.X., Chen, H.M., Zhang, J.B.: Parallel computing of approximations in dominance-based rough sets approach. Knowl. Based Syst. 87, 102–111 (2015)

    Article  Google Scholar 

  18. Zhang, J.B., Wong, J.S., Pan, Y., Li, T.R.: A parallel matrix-based method for computing approximations in incomplete information systems. IEEE Trans. Knowl. Data Eng. 27(2), 326–229 (2015)

    Article  Google Scholar 

  19. Zhang, J.B., Li, T.R., Ruan, D., Gao, Z.Z., Zhao, C.B.: A parallel method for computing rough set approximations. Inf. Sci. 194, 209–223 (2012)

    Article  Google Scholar 

  20. Huang, K.M., Chen, H.Y., Hsiung, K.L.: On realizing rough set algorithms with apache spark. In: Third International Conference on Data Mining, Internet Computing and Big Data, pp. 111–112 (2016)

    Google Scholar 

  21. Gromniak, W.: Scalability of attribute selection methods: application of rough sets and MapReduce. Dissertation Institute of Mathematics, University of Warsaw (2015)

    Google Scholar 

  22. Sarah, V., Asfoor, H., Saeys, Y., Cornelis, C., Tolentino, M.E., Teredesai, A., Cock, M.D.: Distributed fuzzy rough prototype selection for big data regression. In: NAFIPS/WConSC, pp. 1–6 (2015)

    Google Scholar 

  23. Kawhale, R., Patil, S.: Obtaining approximation with data cube using MapReduce. Int. J. Recent Innov. Trends Comput. Commun. 3(7), 4880–4884 (2015). ISSN: 2321–8169

    Google Scholar 

  24. Cui, W.P., Huang, L.: A MapReduce solution for knowledge reduction in big data. IJCSA 13(1), 17–30 (2016)

    MathSciNet  Google Scholar 

  25. Dhande, V., Sarkar, B.K.: Obtaining rough set approximation using MapReduce technique in data mining (2016)

    Google Scholar 

  26. Chaudhuri, A.: Parallel fuzzy rough support vector machine for data classification in cloud environment. Informatica 39(4), 397–420 (2015)

    MathSciNet  Google Scholar 

  27. Nandgaonkar, S.V., Raut, A.B.: Parallel rough set approximation using MapReduce technique in Hadoop (2015)

    Google Scholar 

  28. El-Alfy, E., Alshammari, M.: Towards scalable rough set based attribute subset selection for intrusion detection using parallel genetic algorithm in MapReduce. Simul. Model. Pract. Theory 64, 18–29 (2016)

    Article  Google Scholar 

  29. Kwiatkowski, P., Nguyen, S.H., Nguyen, H.S.: On scalability of rough set methods. In: Hüllermeier, E., Kruse, R., Hoffmann, F. (eds.) IPMU 2010. CCIS, vol. 80, pp. 288–297. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14055-6_30

    Chapter  Google Scholar 

  30. Chen, M., Yuan, J., Li, L., Liu, D., Li, T.: A fast heuristic attribute reduction algorithm using Spark. In: 2017 IEEE 37th International Conference Distributed Computing Systems (ICDCS) (2017)

    Google Scholar 

  31. Yang, Y., Chen, Z., Liang, Z., Wang, G.: Attribute reduction for massive data based on rough set theory and MapReduce. In: Yu, J., Greco, S., Lingras, P., Wang, G., Skowron, A. (eds.) RSKT 2010. LNCS (LNAI), vol. 6401, pp. 672–678. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16248-0_91

    Chapter  Google Scholar 

  32. Xi, D., Wang, G., Zhang, X., Zhang, F.: Parallel attribute reduction based on MapReduce. In: Miao, D., Pedrycz, W., Ślȩzak, D., Peters, G., Hu, Q., Wang, R. (eds.) RSKT 2014. LNCS (LNAI), vol. 8818, pp. 631–641. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11740-9_58

    Chapter  Google Scholar 

  33. Lv, P., Qian, J., Yue, X.: Incremental attribute reduction algorithm for big data using MapReduce. J. Comput. Methods Sci. Eng. 16(3), 641–652 (2016)

    MathSciNet  MATH  Google Scholar 

  34. Feng, L., Li, T., Ruan, D., Gou, S.: A vague-rough set approach for uncertain knowledge acquisition. Knowl. Based Syst. 24(6), 837–843 (2011)

    Article  Google Scholar 

  35. Zhang, J.B., Wong, J., Li, T., Pan, Y.: A comparison of parallel large-scale knowledge acquisition using rough set theory on different MapReduce runtime systems. Int. J. Approximate Reasoning 55(3), 896–907 (2014)

    Article  Google Scholar 

  36. Xin, R.S., Rosen, J., Zaharia, M., Franklin, M., Shenker, S., Stoic, I.: Shark: SQL and rich analytics at scale. In: 2013 ACM SIGMOD International Conference on Management of Data, pp. 13–24 (2013)

    Google Scholar 

  37. Karun, A.K., Chitharanjan, K.: A review on Hadoop–HDFS infrastructure extensions. In: 2013 IEEE Conference on Information & Communication Technologies (ICT), pp. 132–137 (2013)

    Google Scholar 

  38. What is Apache Spark? https://databricks.com/spark/about

  39. Pradeepa, A., Thanamani, A.: Hadoop file system and fundamental concept of MapReduce Interior and closure rough set approximations. Int. J. Adv. Res. Comput. Commun. Eng. 2(10), 5865–5868 (2013)

    Google Scholar 

  40. Patil, P.: Data mining with rough set using MapReduce. Int. J. Innov. Res. Comput. Commun. Eng. 2(11), 6980–6986 (2014)

    Google Scholar 

  41. Zhang, J.B., Li, T.R., Pan, Y.: Parallel rough set based knowledge acquisition using MapReduce from big data. In: 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, pp. 20–27. ACM (2012)

    Google Scholar 

  42. Xu, F., Wei, L., Bi, Z., Zhu, L.: Research on fuzzy rough parallel reduction based on mutual information. J. Comput. Inf. Syst. 10(12), 5391–5401 (2014)

    Google Scholar 

  43. Yang, Y., Chen, Z.: Parallelized computing of attribute core based on rough set theory and MapReduce. In: Li, T., Nguyen, H.S., Wang, G., Grzymala-Busse, J., Janicki, R., Hassanien, A.E., Yu, H. (eds.) RSKT 2012. LNCS (LNAI), vol. 7414, pp. 155–160. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31900-6_20

    Chapter  Google Scholar 

  44. Qian, J., Miao, D., Zhang, Z., Yue, X.: Parallel attribute reduction algorithms using MapReduce. Inf. Sci. 279, 671–690 (2014)

    Article  MathSciNet  Google Scholar 

  45. Wu, M., Sakai, H.: On parallelization of the NIS-apriori algorithm for data mining. Procedia Comput. Sci. 60, 623–631 (2015)

    Article  Google Scholar 

  46. Dai, Y., Sun, H.: The naive Bayes text classification algorithm based on rough set in the cloud platform. J. Chem. Pharm. Res. 6, 1636–1643 (2014)

    Google Scholar 

  47. Weka 3 - Data mining with open source machine learning software in Java. https://www.cs.waikato.ac.nz/ml/weka/

  48. R: The R project for statistical computing. https://www.r-project.org/

  49. Komorowski, J., Ohrn, A., Skowron, A.: The ROSETTA rough set software system. In: Handbook of Data Mining and Knowledge Discovery, pp. 2–3 (2002)

    Google Scholar 

  50. Owen, S.: Mahout in Action. Manning, Shelter Island (2012)

    Google Scholar 

  51. Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D., Amde, M., Owen, S., Xin, D.: MLlib: machine learning in apache spark. J. Mach. Learn. Res. 17(1), 1235–1241 (2016)

    MathSciNet  MATH  Google Scholar 

  52. Lin, J., Dyer, C.: Data-Intensive text processing with MapReduce. Synthesis Lectures on Human Language Technologies, vol. 3, pp. 1–177 (2010)

    Google Scholar 

  53. https://spark.apache.org/docs/latest/img/cluster-overview.png

  54. Garca-Gil, D., Ramrez-Gallego, S., Garca, S., Herrera, F.: A comparison on scalability for batch big data processing on Apache Spark and Apache Flink. Big Data Analytics 2(1) (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bing Zhou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhou, B., Cho, H., Zhang, X. (2018). Scalable Implementations of Rough Set Algorithms: A Survey. In: Mouhoub, M., Sadaoui, S., Ait Mohamed, O., Ali, M. (eds) Recent Trends and Future Technology in Applied Intelligence. IEA/AIE 2018. Lecture Notes in Computer Science(), vol 10868. Springer, Cham. https://doi.org/10.1007/978-3-319-92058-0_62

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-92058-0_62

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-92057-3

  • Online ISBN: 978-3-319-92058-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics