research-article

Parallel rough set based knowledge acquisition using MapReduce from big data

Authors:
Junbo Zhang

Southwest Jiaotong University, Chengdu, China and Georgia State University, Atlanta, GA

Southwest Jiaotong University, Chengdu, China and Georgia State University, Atlanta, GA
View Profile

,
Tianrui Li

Southwest Jiaotong University, Chengdu, China

Southwest Jiaotong University, Chengdu, China
View Profile

,
Yi Pan

Georgia State University, Atlanta, GA

Georgia State University, Atlanta, GA
View Profile

BigMine '12: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and ApplicationsAugust 2012Pages 20–27https://doi.org/10.1145/2351316.2351320

Published:12 August 2012Publication History

BigMine '12: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications

Pages 20–27

ABSTRACT

Nowadays, with the volume of data growing at an unprecedented rate, big data mining and knowledge discovery have become a new challenge. Rough set theory for knowledge acquisition has been successfully applied in data mining. The recently introduced MapReduce technique has received much attention from both scientific community and industry for its applicability in big data analysis. To mine knowledge from big data, we present parallel rough set based methods for knowledge acquisition using MapReduce in this paper. Comprehensive experimental evaluation on large data sets shows that the proposed parallel methods can effectively process big data.

References

S. Blanas, J. M. Patel, V. Ercegovac, J. Rao, E. J. Shekita, and Y. Tian. A comparison of join algorithms for log processing in mapreduce. In Proceedings of the 2010 international conference on Management of data, SIGMOD'10, pages 975--986, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. In Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6, OSDI'04, pages 10--10, Berkeley, CA, USA, 2004. USENIX Association. Google ScholarDigital Library
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1): 107--113, Jan. 2008. Google ScholarDigital Library
J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: a runtime for iterative mapreduce. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC'10, pages 810--818, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
A. Ene, S. Im, and B. Moseley. Fast clustering using mapreduce. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD'11, pages 681--689, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
J. W. Grzymala-Busse and W. Ziarko. Data mining and rough set theory. Commun. ACM, 43(4): 108--109, Apr. 2000. Google ScholarDigital Library
J. Han, M. Kamber, Data Mining: Concepts and Techniques, 2nd Edition, Morgan Kaufman, San Francisco, 2006. Google ScholarDigital Library
B. He, W. Fang, Q. Luo, N. K. Govindaraju, and T. Wang. Mars: a mapreduce framework on graphics processors. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques, PACT'08, pages 260--269, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
Q. Hu, W. Pedrycz, D. Yu, and J. Lang. Selecting discrete and continuous features based on neighborhood decision error minimization. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 40(1): 137--150, feb. 2010. Google ScholarDigital Library
Q. Hu, Z. Xie, and D. Yu. Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation. Pattern Recognition, 40(12): 3509--3521, Dec. 2007. Google ScholarDigital Library
K. Kaneiwa. A rough set approach to mining connections from information systems. In Proceedings of the 2010 ACM Symposium on Applied Computing, SAC'10, pages 990--996, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
Y. Leung, W.-Z. Wu, and W.-X. Zhang. Knowledge acquisition in incomplete information systems: A rough set approach. European Journal of Operational Research, 168(1): 164--180, Jan. 2006.Google ScholarCross Ref
J. Lin and M. Schatz. Design patterns for efficient graph algorithms in mapreduce. In Proceedings of the Eighth Workshop on Mining and Learning with Graphs, MLG'10, pages 78--85, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
D. Liu, T. Li, D. Ruan, and J. Zhang. Incremental learning optimization on knowledge discovery in dynamic business intelligent systems. Journal of Global Optimization, 51: 325--344, 2011. Google ScholarDigital Library
R. K. Menon, G. P. Bhat, and M. C. Schatz. Rapid parallel genome indexing with mapreduce. In Proceedings of the second international workshop on MapReduce and its applications, MapReduce'11, pages 51--58, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
Z. Pawlak, J. Grzymala-Busse, R. Slowinski, and W. Ziarko. Rough sets. Commun. ACM, 38(11): 88--95, Nov. 1995. Google ScholarDigital Library
Y. Qian, J. Liang, W. Pedrycz, and C. Dang. Positive approximation: An accelerator for attribute reduction in rough set theory. Artificial Intelligence, 174(9--10): 597--618, June 2010. Google ScholarDigital Library
C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating mapreduce for multi-core and multiprocessor systems. In Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture, HPCA'07, pages 13--24, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarDigital Library
S. Tsumoto. Automated extraction of medical expert system rules from clinical databases based on rough set theory. Information Sciences, 112(1--4): 67--84, Dec. 1998. Google ScholarDigital Library
S. Wong, W. Ziarko, and R. Ye. Comparison of rough-set and statistical methods in inductive learning. International Journal of Man-Machine Studies, 25(1): 53--72, July 1986. Google ScholarDigital Library
X. Xu, J. Jäger, and H.-P. Kriegel. A fast parallel clustering algorithm for large spatial databases. Data Min. Knowl. Discov., 3(3): 263--290, Sept. 1999. Google ScholarDigital Library
J. Zhang, T. Li, D. Ruan, Z. Gao, and C. Zhao. A parallel method for computing rough set approximations. Information Sciences, 194(0): 209--223, July 2012. Google ScholarDigital Library
W. Ziarko. Discovery through rough set theory. Commun. ACM, 42(11): 54--57, Nov. 1999. Google ScholarDigital Library
Hadoop: Open source implementation of MapReduce, <http://hadoop.apache.org/mapreduce/>.Google Scholar
Mahout: Scalable machine learning and data mining, <http://mahout.apache.org/>Google Scholar
KDDCup-99, <http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html>.Google Scholar
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I. H. Witten, The WEKA Data Mining Software: An Update, SIGKDD Explorations, 11 (1): 10--18, 2009. Google ScholarDigital Library

Index Terms

Parallel rough set based knowledge acquisition using MapReduce from big data

Recommendations

A comparison of parallel large-scale knowledge acquisition using rough set theory on different MapReduce runtime systems

Nowadays, with the volume of data growing at an unprecedented rate, large-scale data mining and knowledge discovery have become a new challenge. Rough set theory for knowledge acquisition has been successfully applied in data mining. The recently ...
Read More
Challenges for MapReduce in Big Data
SERVICES '14: Proceedings of the 2014 IEEE World Congress on Services

In the Big Data community, MapReduce has been seen as one of the key enabling approaches for meeting continuously increasing demands on computing resources imposed by massive data sets. The reason for this is the high scalability of the MapReduce ...
Read More
Prominence of MapReduce in Big Data Processing
CSNT '14: Proceedings of the 2014 Fourth International Conference on Communication Systems and Network Technologies

Big Data has come up with aureate haste and a clef enabler for the social business, Big Data gifts an opportunity to create extraordinary business advantage and better service delivery. Big Data is bringing a positive change in the decision making ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
BigMine '12: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
August 2012
134 pages
ISBN:9781450315470
DOI:10.1145/2351316
Program Chairs:
Wei Fan
IBM T.J. Watson Research
,
Albert Bifet
University of Waikato
,
Qiang Yang
Hong Kong University of Science and Technology,
,
Philip Yu
University of Illinois at Chicago
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 August 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
MapReduce
big data
knowledge acquisition
rough sets
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate13of23submissions,57%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 26
  Total Citations
  View Citations
- 1,302
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Parallel rough set based knowledge acquisition using MapReduce from big data

BigMine '12: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications

ABSTRACT

References

Cited By

Index Terms

Recommendations

A comparison of parallel large-scale knowledge acquisition using rough set theory on different MapReduce runtime systems

Challenges for MapReduce in Big Data

Prominence of MapReduce in Big Data Processing