ABSTRACT
Nowadays, with the volume of data growing at an unprecedented rate, big data mining and knowledge discovery have become a new challenge. Rough set theory for knowledge acquisition has been successfully applied in data mining. The recently introduced MapReduce technique has received much attention from both scientific community and industry for its applicability in big data analysis. To mine knowledge from big data, we present parallel rough set based methods for knowledge acquisition using MapReduce in this paper. Comprehensive experimental evaluation on large data sets shows that the proposed parallel methods can effectively process big data.
- S. Blanas, J. M. Patel, V. Ercegovac, J. Rao, E. J. Shekita, and Y. Tian. A comparison of join algorithms for log processing in mapreduce. In Proceedings of the 2010 international conference on Management of data, SIGMOD'10, pages 975--986, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. In Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6, OSDI'04, pages 10--10, Berkeley, CA, USA, 2004. USENIX Association. Google ScholarDigital Library
- J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1): 107--113, Jan. 2008. Google ScholarDigital Library
- J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: a runtime for iterative mapreduce. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC'10, pages 810--818, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- A. Ene, S. Im, and B. Moseley. Fast clustering using mapreduce. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD'11, pages 681--689, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- J. W. Grzymala-Busse and W. Ziarko. Data mining and rough set theory. Commun. ACM, 43(4): 108--109, Apr. 2000. Google ScholarDigital Library
- J. Han, M. Kamber, Data Mining: Concepts and Techniques, 2nd Edition, Morgan Kaufman, San Francisco, 2006. Google ScholarDigital Library
- B. He, W. Fang, Q. Luo, N. K. Govindaraju, and T. Wang. Mars: a mapreduce framework on graphics processors. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques, PACT'08, pages 260--269, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- Q. Hu, W. Pedrycz, D. Yu, and J. Lang. Selecting discrete and continuous features based on neighborhood decision error minimization. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 40(1): 137--150, feb. 2010. Google ScholarDigital Library
- Q. Hu, Z. Xie, and D. Yu. Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation. Pattern Recognition, 40(12): 3509--3521, Dec. 2007. Google ScholarDigital Library
- K. Kaneiwa. A rough set approach to mining connections from information systems. In Proceedings of the 2010 ACM Symposium on Applied Computing, SAC'10, pages 990--996, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- Y. Leung, W.-Z. Wu, and W.-X. Zhang. Knowledge acquisition in incomplete information systems: A rough set approach. European Journal of Operational Research, 168(1): 164--180, Jan. 2006.Google ScholarCross Ref
- J. Lin and M. Schatz. Design patterns for efficient graph algorithms in mapreduce. In Proceedings of the Eighth Workshop on Mining and Learning with Graphs, MLG'10, pages 78--85, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- D. Liu, T. Li, D. Ruan, and J. Zhang. Incremental learning optimization on knowledge discovery in dynamic business intelligent systems. Journal of Global Optimization, 51: 325--344, 2011. Google ScholarDigital Library
- R. K. Menon, G. P. Bhat, and M. C. Schatz. Rapid parallel genome indexing with mapreduce. In Proceedings of the second international workshop on MapReduce and its applications, MapReduce'11, pages 51--58, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- Z. Pawlak, J. Grzymala-Busse, R. Slowinski, and W. Ziarko. Rough sets. Commun. ACM, 38(11): 88--95, Nov. 1995. Google ScholarDigital Library
- Y. Qian, J. Liang, W. Pedrycz, and C. Dang. Positive approximation: An accelerator for attribute reduction in rough set theory. Artificial Intelligence, 174(9--10): 597--618, June 2010. Google ScholarDigital Library
- C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating mapreduce for multi-core and multiprocessor systems. In Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture, HPCA'07, pages 13--24, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarDigital Library
- S. Tsumoto. Automated extraction of medical expert system rules from clinical databases based on rough set theory. Information Sciences, 112(1--4): 67--84, Dec. 1998. Google ScholarDigital Library
- S. Wong, W. Ziarko, and R. Ye. Comparison of rough-set and statistical methods in inductive learning. International Journal of Man-Machine Studies, 25(1): 53--72, July 1986. Google ScholarDigital Library
- X. Xu, J. Jäger, and H.-P. Kriegel. A fast parallel clustering algorithm for large spatial databases. Data Min. Knowl. Discov., 3(3): 263--290, Sept. 1999. Google ScholarDigital Library
- J. Zhang, T. Li, D. Ruan, Z. Gao, and C. Zhao. A parallel method for computing rough set approximations. Information Sciences, 194(0): 209--223, July 2012. Google ScholarDigital Library
- W. Ziarko. Discovery through rough set theory. Commun. ACM, 42(11): 54--57, Nov. 1999. Google ScholarDigital Library
- Hadoop: Open source implementation of MapReduce, <http://hadoop.apache.org/mapreduce/>.Google Scholar
- Mahout: Scalable machine learning and data mining, <http://mahout.apache.org/>Google Scholar
- KDDCup-99, <http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html>.Google Scholar
- M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I. H. Witten, The WEKA Data Mining Software: An Update, SIGKDD Explorations, 11 (1): 10--18, 2009. Google ScholarDigital Library
Index Terms
- Parallel rough set based knowledge acquisition using MapReduce from big data
Recommendations
A comparison of parallel large-scale knowledge acquisition using rough set theory on different MapReduce runtime systems
Nowadays, with the volume of data growing at an unprecedented rate, large-scale data mining and knowledge discovery have become a new challenge. Rough set theory for knowledge acquisition has been successfully applied in data mining. The recently ...
Challenges for MapReduce in Big Data
SERVICES '14: Proceedings of the 2014 IEEE World Congress on ServicesIn the Big Data community, MapReduce has been seen as one of the key enabling approaches for meeting continuously increasing demands on computing resources imposed by massive data sets. The reason for this is the high scalability of the MapReduce ...
Prominence of MapReduce in Big Data Processing
CSNT '14: Proceedings of the 2014 Fourth International Conference on Communication Systems and Network TechnologiesBig Data has come up with aureate haste and a clef enabler for the social business, Big Data gifts an opportunity to create extraordinary business advantage and better service delivery. Big Data is bringing a positive change in the decision making ...
Comments