Scalable Implementations of Rough Set Algorithms: A Survey

Zhou, Bing; Cho, Hyuk; Zhang, Xin

doi:10.1007/978-3-319-92058-0_62

Bing Zhou¹⁷,
Hyuk Cho¹⁷ &
Xin Zhang¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10868))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

3063 Accesses
2 Citations

Abstract

With the rapid change of volume, variety, and velocity of data across real-life domains, learning from big data has become a growing challenge. Rough set theory has been successfully applied to knowledge discovery from databases (KDD) for handling data with imperfections. Most traditional rough set algorithms were implemented in a sequential manner and ran on a single machine, becoming computationally expensive and inefficient for handling massive data. Recent computing frameworks, such as MapReduce and Apache Spark, made it possible to realize parallel rough set algorithms on distributed clusters of commodity computers and speed up big data analyses. Although a variety of scalable rough set implementations have been developed, (1) most proposed research compared their work with outdated sequential implementations; (2) certain distributed computing frameworks were used more frequently, overlooking recently developed frameworks; and (3) existing issues and guidance in adapting new computing frameworks are lacking. The main objective of this paper is to provide current state-of-the-art scalable implementations of rough set algorithms. This paper will help researchers catch up with the recent developments in this field and further provide some insights to develop rough set algorithms in up-to-date high performance computing environments for big data analytics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Pawlak, Z.: Rough Sets, Theoretical Aspects of Reasoning About Data. Kluwer Academic Publishers, Dordrecht (1991)
MATH Google Scholar
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)
MATH Google Scholar
Zadeh, L.A.: Fuzzy sets. Inf. Control 8(3), 338–353 (1965)
Article Google Scholar
Zadeh, L.: Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Syst. 1(1), 3–28 (1978)
Article MathSciNet Google Scholar
Shafer, G.: A Mathematical Theory of Evidence. Princeton University Press, Princeton (1976)
MATH Google Scholar
Hasan, A., Srinivasan, R., Vasudevan, G., Verbiest, N., Cornelis, C., Tolentino, M.E., Teredesai, A., Cock, M.D.: Computing fuzzy rough approximations in large scale information systems. In: BigData Conference, pp. 9–16 (2014)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Zaharia, M., Xin, R.S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M.J.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)
Article Google Scholar
Apache Flink: Scalable stream and batch data processing. https://flink.apache.org/
Apache Storm. http://storm.apache.org/
Samza. http://samza.apache.org/
Pawlak, Z.: Rough set approach to knowledge-based decision support. Eur. J. Oper. Res. 99(1), 48–57 (1997)
Article MathSciNet Google Scholar
Jadhav, S., Suryawanshi, S.: A survey on parallel rough set based knowledge acquisition using MapReduce from big data (2014)
Google Scholar
Nandgaonkar, Suruchi, V., Raut, A.B.: A survey on parallel method for rough set using MapReduce technique for data mining. Int. J. Eng. Comput. Sci. (2015)
Google Scholar
Li, T., Luo, C., Chen, H., Zhang, J.: PICKT: a solution for big data analysis. In: Ciucci, D., Wang, G., Mitra, S., Wu, W.-Z. (eds.) RSKT 2015. LNCS (LNAI), vol. 9436, pp. 15–25. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25754-9_2
Chapter Google Scholar
Zhang, J., Li, T., Pan, Y.: PLAR: parallel large-scale attribute reduction on cloud systems. In: PDCAT, pp. 184–191 (2013)
Google Scholar
Li, S.Y., Li, T.R., Zhang, Z.X., Chen, H.M., Zhang, J.B.: Parallel computing of approximations in dominance-based rough sets approach. Knowl. Based Syst. 87, 102–111 (2015)
Article Google Scholar
Zhang, J.B., Wong, J.S., Pan, Y., Li, T.R.: A parallel matrix-based method for computing approximations in incomplete information systems. IEEE Trans. Knowl. Data Eng. 27(2), 326–229 (2015)
Article Google Scholar
Zhang, J.B., Li, T.R., Ruan, D., Gao, Z.Z., Zhao, C.B.: A parallel method for computing rough set approximations. Inf. Sci. 194, 209–223 (2012)
Article Google Scholar
Huang, K.M., Chen, H.Y., Hsiung, K.L.: On realizing rough set algorithms with apache spark. In: Third International Conference on Data Mining, Internet Computing and Big Data, pp. 111–112 (2016)
Google Scholar
Gromniak, W.: Scalability of attribute selection methods: application of rough sets and MapReduce. Dissertation Institute of Mathematics, University of Warsaw (2015)
Google Scholar
Sarah, V., Asfoor, H., Saeys, Y., Cornelis, C., Tolentino, M.E., Teredesai, A., Cock, M.D.: Distributed fuzzy rough prototype selection for big data regression. In: NAFIPS/WConSC, pp. 1–6 (2015)
Google Scholar
Kawhale, R., Patil, S.: Obtaining approximation with data cube using MapReduce. Int. J. Recent Innov. Trends Comput. Commun. 3(7), 4880–4884 (2015). ISSN: 2321–8169
Google Scholar
Cui, W.P., Huang, L.: A MapReduce solution for knowledge reduction in big data. IJCSA 13(1), 17–30 (2016)
MathSciNet Google Scholar
Dhande, V., Sarkar, B.K.: Obtaining rough set approximation using MapReduce technique in data mining (2016)
Google Scholar
Chaudhuri, A.: Parallel fuzzy rough support vector machine for data classification in cloud environment. Informatica 39(4), 397–420 (2015)
MathSciNet Google Scholar
Nandgaonkar, S.V., Raut, A.B.: Parallel rough set approximation using MapReduce technique in Hadoop (2015)
Google Scholar
El-Alfy, E., Alshammari, M.: Towards scalable rough set based attribute subset selection for intrusion detection using parallel genetic algorithm in MapReduce. Simul. Model. Pract. Theory 64, 18–29 (2016)
Article Google Scholar
Kwiatkowski, P., Nguyen, S.H., Nguyen, H.S.: On scalability of rough set methods. In: Hüllermeier, E., Kruse, R., Hoffmann, F. (eds.) IPMU 2010. CCIS, vol. 80, pp. 288–297. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14055-6_30
Chapter Google Scholar
Chen, M., Yuan, J., Li, L., Liu, D., Li, T.: A fast heuristic attribute reduction algorithm using Spark. In: 2017 IEEE 37th International Conference Distributed Computing Systems (ICDCS) (2017)
Google Scholar
Yang, Y., Chen, Z., Liang, Z., Wang, G.: Attribute reduction for massive data based on rough set theory and MapReduce. In: Yu, J., Greco, S., Lingras, P., Wang, G., Skowron, A. (eds.) RSKT 2010. LNCS (LNAI), vol. 6401, pp. 672–678. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16248-0_91
Chapter Google Scholar
Xi, D., Wang, G., Zhang, X., Zhang, F.: Parallel attribute reduction based on MapReduce. In: Miao, D., Pedrycz, W., Ślȩzak, D., Peters, G., Hu, Q., Wang, R. (eds.) RSKT 2014. LNCS (LNAI), vol. 8818, pp. 631–641. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11740-9_58
Chapter Google Scholar
Lv, P., Qian, J., Yue, X.: Incremental attribute reduction algorithm for big data using MapReduce. J. Comput. Methods Sci. Eng. 16(3), 641–652 (2016)
MathSciNet MATH Google Scholar
Feng, L., Li, T., Ruan, D., Gou, S.: A vague-rough set approach for uncertain knowledge acquisition. Knowl. Based Syst. 24(6), 837–843 (2011)
Article Google Scholar
Zhang, J.B., Wong, J., Li, T., Pan, Y.: A comparison of parallel large-scale knowledge acquisition using rough set theory on different MapReduce runtime systems. Int. J. Approximate Reasoning 55(3), 896–907 (2014)
Article Google Scholar
Xin, R.S., Rosen, J., Zaharia, M., Franklin, M., Shenker, S., Stoic, I.: Shark: SQL and rich analytics at scale. In: 2013 ACM SIGMOD International Conference on Management of Data, pp. 13–24 (2013)
Google Scholar
Karun, A.K., Chitharanjan, K.: A review on Hadoop–HDFS infrastructure extensions. In: 2013 IEEE Conference on Information & Communication Technologies (ICT), pp. 132–137 (2013)
Google Scholar
What is Apache Spark? https://databricks.com/spark/about
Pradeepa, A., Thanamani, A.: Hadoop file system and fundamental concept of MapReduce Interior and closure rough set approximations. Int. J. Adv. Res. Comput. Commun. Eng. 2(10), 5865–5868 (2013)
Google Scholar
Patil, P.: Data mining with rough set using MapReduce. Int. J. Innov. Res. Comput. Commun. Eng. 2(11), 6980–6986 (2014)
Google Scholar
Zhang, J.B., Li, T.R., Pan, Y.: Parallel rough set based knowledge acquisition using MapReduce from big data. In: 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, pp. 20–27. ACM (2012)
Google Scholar
Xu, F., Wei, L., Bi, Z., Zhu, L.: Research on fuzzy rough parallel reduction based on mutual information. J. Comput. Inf. Syst. 10(12), 5391–5401 (2014)
Google Scholar
Yang, Y., Chen, Z.: Parallelized computing of attribute core based on rough set theory and MapReduce. In: Li, T., Nguyen, H.S., Wang, G., Grzymala-Busse, J., Janicki, R., Hassanien, A.E., Yu, H. (eds.) RSKT 2012. LNCS (LNAI), vol. 7414, pp. 155–160. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31900-6_20
Chapter Google Scholar
Qian, J., Miao, D., Zhang, Z., Yue, X.: Parallel attribute reduction algorithms using MapReduce. Inf. Sci. 279, 671–690 (2014)
Article MathSciNet Google Scholar
Wu, M., Sakai, H.: On parallelization of the NIS-apriori algorithm for data mining. Procedia Comput. Sci. 60, 623–631 (2015)
Article Google Scholar
Dai, Y., Sun, H.: The naive Bayes text classification algorithm based on rough set in the cloud platform. J. Chem. Pharm. Res. 6, 1636–1643 (2014)
Google Scholar
Weka 3 - Data mining with open source machine learning software in Java. https://www.cs.waikato.ac.nz/ml/weka/
R: The R project for statistical computing. https://www.r-project.org/
Komorowski, J., Ohrn, A., Skowron, A.: The ROSETTA rough set software system. In: Handbook of Data Mining and Knowledge Discovery, pp. 2–3 (2002)
Google Scholar
Owen, S.: Mahout in Action. Manning, Shelter Island (2012)
Google Scholar
Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D., Amde, M., Owen, S., Xin, D.: MLlib: machine learning in apache spark. J. Mach. Learn. Res. 17(1), 1235–1241 (2016)
MathSciNet MATH Google Scholar
Lin, J., Dyer, C.: Data-Intensive text processing with MapReduce. Synthesis Lectures on Human Language Technologies, vol. 3, pp. 1–177 (2010)
Google Scholar
https://spark.apache.org/docs/latest/img/cluster-overview.png
Garca-Gil, D., Ramrez-Gallego, S., Garca, S., Herrera, F.: A comparison on scalability for batch big data processing on Apache Spark and Apache Flink. Big Data Analytics 2(1) (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Sam Houston State University, Huntsville, TX, 77341, USA
Bing Zhou, Hyuk Cho & Xin Zhang

Authors

Bing Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Hyuk Cho
View author publications
You can also search for this author in PubMed Google Scholar
Xin Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bing Zhou .

Editor information

Editors and Affiliations

University of Regina, Regina, SK, Canada
Malek Mouhoub
University of Regina, Regina, SK, Canada
Samira Sadaoui
Concordia University, Montreal, QC, Canada
Otmane Ait Mohamed
Texas State University, San Marcos, TX, USA
Moonis Ali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, B., Cho, H., Zhang, X. (2018). Scalable Implementations of Rough Set Algorithms: A Survey. In: Mouhoub, M., Sadaoui, S., Ait Mohamed, O., Ali, M. (eds) Recent Trends and Future Technology in Applied Intelligence. IEA/AIE 2018. Lecture Notes in Computer Science(), vol 10868. Springer, Cham. https://doi.org/10.1007/978-3-319-92058-0_62

Download citation

DOI: https://doi.org/10.1007/978-3-319-92058-0_62
Published: 30 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92057-3
Online ISBN: 978-3-319-92058-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Scalable Implementations of Rough Set Algorithms: A Survey