skip to main content
10.1145/2351316.2351320acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Parallel rough set based knowledge acquisition using MapReduce from big data

Authors Info & Claims
Published:12 August 2012Publication History

ABSTRACT

Nowadays, with the volume of data growing at an unprecedented rate, big data mining and knowledge discovery have become a new challenge. Rough set theory for knowledge acquisition has been successfully applied in data mining. The recently introduced MapReduce technique has received much attention from both scientific community and industry for its applicability in big data analysis. To mine knowledge from big data, we present parallel rough set based methods for knowledge acquisition using MapReduce in this paper. Comprehensive experimental evaluation on large data sets shows that the proposed parallel methods can effectively process big data.

References

  1. S. Blanas, J. M. Patel, V. Ercegovac, J. Rao, E. J. Shekita, and Y. Tian. A comparison of join algorithms for log processing in mapreduce. In Proceedings of the 2010 international conference on Management of data, SIGMOD'10, pages 975--986, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. In Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6, OSDI'04, pages 10--10, Berkeley, CA, USA, 2004. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1): 107--113, Jan. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: a runtime for iterative mapreduce. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC'10, pages 810--818, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Ene, S. Im, and B. Moseley. Fast clustering using mapreduce. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD'11, pages 681--689, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. W. Grzymala-Busse and W. Ziarko. Data mining and rough set theory. Commun. ACM, 43(4): 108--109, Apr. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Han, M. Kamber, Data Mining: Concepts and Techniques, 2nd Edition, Morgan Kaufman, San Francisco, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. He, W. Fang, Q. Luo, N. K. Govindaraju, and T. Wang. Mars: a mapreduce framework on graphics processors. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques, PACT'08, pages 260--269, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Q. Hu, W. Pedrycz, D. Yu, and J. Lang. Selecting discrete and continuous features based on neighborhood decision error minimization. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 40(1): 137--150, feb. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Q. Hu, Z. Xie, and D. Yu. Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation. Pattern Recognition, 40(12): 3509--3521, Dec. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. K. Kaneiwa. A rough set approach to mining connections from information systems. In Proceedings of the 2010 ACM Symposium on Applied Computing, SAC'10, pages 990--996, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y. Leung, W.-Z. Wu, and W.-X. Zhang. Knowledge acquisition in incomplete information systems: A rough set approach. European Journal of Operational Research, 168(1): 164--180, Jan. 2006.Google ScholarGoogle ScholarCross RefCross Ref
  13. J. Lin and M. Schatz. Design patterns for efficient graph algorithms in mapreduce. In Proceedings of the Eighth Workshop on Mining and Learning with Graphs, MLG'10, pages 78--85, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Liu, T. Li, D. Ruan, and J. Zhang. Incremental learning optimization on knowledge discovery in dynamic business intelligent systems. Journal of Global Optimization, 51: 325--344, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. K. Menon, G. P. Bhat, and M. C. Schatz. Rapid parallel genome indexing with mapreduce. In Proceedings of the second international workshop on MapReduce and its applications, MapReduce'11, pages 51--58, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Z. Pawlak, J. Grzymala-Busse, R. Slowinski, and W. Ziarko. Rough sets. Commun. ACM, 38(11): 88--95, Nov. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Y. Qian, J. Liang, W. Pedrycz, and C. Dang. Positive approximation: An accelerator for attribute reduction in rough set theory. Artificial Intelligence, 174(9--10): 597--618, June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating mapreduce for multi-core and multiprocessor systems. In Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture, HPCA'07, pages 13--24, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Tsumoto. Automated extraction of medical expert system rules from clinical databases based on rough set theory. Information Sciences, 112(1--4): 67--84, Dec. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Wong, W. Ziarko, and R. Ye. Comparison of rough-set and statistical methods in inductive learning. International Journal of Man-Machine Studies, 25(1): 53--72, July 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. X. Xu, J. Jäger, and H.-P. Kriegel. A fast parallel clustering algorithm for large spatial databases. Data Min. Knowl. Discov., 3(3): 263--290, Sept. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Zhang, T. Li, D. Ruan, Z. Gao, and C. Zhao. A parallel method for computing rough set approximations. Information Sciences, 194(0): 209--223, July 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. W. Ziarko. Discovery through rough set theory. Commun. ACM, 42(11): 54--57, Nov. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Hadoop: Open source implementation of MapReduce, <http://hadoop.apache.org/mapreduce/>.Google ScholarGoogle Scholar
  25. Mahout: Scalable machine learning and data mining, <http://mahout.apache.org/>Google ScholarGoogle Scholar
  26. KDDCup-99, <http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html>.Google ScholarGoogle Scholar
  27. M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I. H. Witten, The WEKA Data Mining Software: An Update, SIGKDD Explorations, 11 (1): 10--18, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Parallel rough set based knowledge acquisition using MapReduce from big data

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            BigMine '12: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
            August 2012
            134 pages
            ISBN:9781450315470
            DOI:10.1145/2351316

            Copyright © 2012 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 12 August 2012

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate13of23submissions,57%

            Upcoming Conference

            KDD '24

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader