research-article

Compressed Bitmaps Based Frequent Itemsets Mining on Hadoop

Authors:
Aref A. Saeed

Department of Computer Science, University of Peshawar, Peshawar, Pakistan

Department of Computer Science, University of Peshawar, Peshawar, Pakistan
View Profile

,
Azhar Rauf

Department of Computer Science, University of Peshawar, Peshawar, Pakistan

Department of Computer Science, University of Peshawar, Peshawar, Pakistan
View Profile

,
Shah Khusro

Department of Computer Science, University of Peshawar, Peshawar, Pakistan

Department of Computer Science, University of Peshawar, Peshawar, Pakistan
View Profile

,
Saeed Mahfooz

Department of Computer Science, University of Peshawar, Peshawar, Pakistan

Department of Computer Science, University of Peshawar, Peshawar, Pakistan
View Profile

INFOS '16: Proceedings of the 10th International Conference on Informatics and SystemsMay 2016Pages 159–165https://doi.org/10.1145/2908446.2908457

Published:09 May 2016Publication History

INFOS '16: Proceedings of the 10th International Conference on Informatics and Systems

Pages 159–165

ABSTRACT

Frequent itemsets mining is one of the interesting applications of data mining. Recently data mining has got a great deal of attention due to the explosive growth in data and the economic and scientific need for turning such data into useful information. However, the traditional frequent itemsets mining algorithms have become inefficient to work with large datasets effectively on a single machine due to computational power and memory limits. Current methods prefer to control the execution time and output by using higher minimum support thresholds, which lead to less candidates and frequent itemsets. In this paper, an improved-version of Apriori like HFDM-EB algorithm that can deal with lower minimum support thresholds is proposed for mining frequent itemsets over big transactional data on Hadoop framework and by utilizing compressed bitmaps. The experimental results show that the improved algorithm is efficient and scalable for mining frequent itemsets in big data.

References

Apache Giraph. https://giraph.apache.org/Google Scholar
Apache Storm. https://storm.apache.org/Google Scholar
Apache Tez. https://tez.apache.org/Google Scholar
Apache Hadoop. http://hadoop.apache.orgGoogle Scholar
IBM Synthetic Data Generator. http://www.philippe-fournier-viger.com/spmf/datasets/IBM_Quest_data_generator.zipGoogle Scholar
Microsoft's Cloud based Hadoop Distribution. http://www.azure.microsoft.com/en-in/services/hdinsigh/Google Scholar
What is big data? - Bringing big data to the enterprise. http://www-01.ibm.com/software/au/data/bigdata/Google Scholar
Agrawal, R. and Shafer, J.C., 1996. Parallel mining of association rules. IEEE Trans. Knowl. Data Eng. 8, (6), 962--969. Google ScholarDigital Library
Agrawal, R. and Srikant, R., 1994. Fast Algorithms for Mining Association Rules in Large Databases. In Proceedings of the Proceedings of the 20th International Conference on Very Large Data Bases (1994), Morgan Kaufmann Publishers Inc., 672836, 487--499. Google ScholarDigital Library
Antoshenkov, G., 1995. Byte-aligned bitmap compression. In Data Compression Conference, 1995. DCC '95. Proceedings, Washington, DC, USA, 476. DOI= http://dx.doi.org/10.1109/DCC.1995.515586. Google ScholarDigital Library
Buehrer, G., Parthasarathy, S., Tatikonda, S., Kurc, T., and Saltz, J., 2007. Toward terabyte pattern mining: an architecture-conscious solution. In Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming ACM, 2--12. Google ScholarDigital Library
Chambi, S., Lemire, D., Kaser, O., and Godin, R., 2015. Better bitmap performance with Roaring bitmaps. Softw. Pract. Exper. DOI= http://dx.doi.org/10.1002/spe.2325.Google Scholar
Chan, C.-Y. and Ioannidis, Y.E., 1998. Bitmap index design and evaluation. In Proceedings of the Proceedings of the 1998 ACM SIGMOD international conference on Management of data (Seattle, Washington, USA1998), ACM, 276336, 355--366. DOI= http://dx.doi.org/10.1145/276304.276336. Google ScholarDigital Library
Chaudhuri, S. and Dayal, U., 1997. An overview of data warehousing and OLAP technology. SIGMOD Rec. 26, (1), 65--74. DOI= http://dx.doi.org/10.1145/248603.248616. Google ScholarDigital Library
Colantonio, A. and Pietro, R.D., 2010. Concise: Compressed 'n' Composable Integer Set. Inform. Process Lett. 110, (16), 644--650. DOI= http://dx.doi.org/10.1016/j.ipl.2010.05.018. Google ScholarDigital Library
Cong, S., Han, J., Hoeflinger, J., and Padua, D., 2005. A sampling-based framework for parallel data mining. In Proceedings of the Proceedings of the 10th ACM SIGPLAN symposium on Principles and practice of parallel programming (Chicago, IL2005), ACM, 1065979, 255--265. DOI= http://dx.doi.org/10.1145/1065944.1065979. Google ScholarDigital Library
Cukier, K., 2010. Data, data everywhere. special report on managing information. The Economist Newspaper Ltd.Google Scholar
Davis, K.C. and Gupta, A., 2007. Data Warehouses and OLAP: Concepts, Architectures, and Solutions. In Data Warehouses and OLAP: Concepts, Architectures, and Solutions IRM Press.Google Scholar
De Alwis, B., Malinga, S., Pradeeban, K., Weerasiri, D., and Perera, S., 2010. Horizontal format data mining with extended bitmaps. In Proceedings of the International Conference of Soft Computing and Pattern Recognition (SoCPaR), 220--223. DOI= http://dx.doi.org/10.1109/SOCPAR.2010.5686156.Google Scholar
Dean, J. and Ghemawat, S., 2004. MapReduce: Simplified Data Processing on Large Clusters. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation, San Francisco, CA, 137--150. Google ScholarDigital Library
El-Hajj, M. and Zaiane, O.R., 2006. Parallel leap: large-scale maximal pattern mining in a distributed environment. In Proceedings of the 12th International Conference on Parallel and Distributed Systems IEEE, 8. Google ScholarDigital Library
Fan, W. and Bifet, A., 2013. Mining big data: current status, and forecast to the future. SIGKDD Explor. Newsl. 14, (2), 1--5. DOI= http://dx.doi.org/10.1145/2481244.2481246. Google ScholarDigital Library
Fang, W., Lau, K.K., LU, M., Xiao, X., Lam, C.K., Yang, P.Y., He, B., Luo, Q., Sander, P.V., and Yang, K., 2008. Parallel data mining on graphics processors. Tech. Rep. HKUST-CS08-07. Hong Kong Univ. sci. Technology.Google Scholar
Farzanyar, Z., Kangavari, M., and Hashemi, S., 2006. An efficient distributed algorithm for mining association rules. In Proceedings of the Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications (Sorrento, Italy2006), Springer-Verlag, 2173782, 383--393. DOI= http://dx.doi.org/10.1007/11946441_38. Google ScholarDigital Library
Georgii, E., Richter, L., Rückert, U., and Kramer, S., 2005. Analyzing microarray data using quantitative association rules. Bioinformatics 21, (suppl 2), ii123-ii129. Google ScholarDigital Library
Goethals, B., 2003. Survey on frequent pattern mining. Univ. of Helsinki.Google Scholar
Han, J., Kamber, M., and Pei, J., 2011. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc., MA. Google ScholarDigital Library
INTEL, 2012. Big Data Analytics: Intel's IT manager survey on how organizations are using big data. Tech. Rep. Intel IT Center Peer Research.Google Scholar
Isard, M., Budiu, M., Yu, Y., Birrell, A., and Fetterly, D., 2007. Dryad: distributed data-parallel programs from sequential building blocks 41, (3), 59--72. DOI= http://dx.doi.org/10.1145/1272998.1273005. Google ScholarDigital Library
Jin, R., Yang, G., and Agrawal, G., 2005. Shared memory parallelization of data mining algorithms: Techniques, programming interface, and performance. IEEE Trans. Knowl. Data Eng. 17, (1), 71--89. Google ScholarDigital Library
Kun-Ming, Y. and Jia-Ling, Z., 2008. A weighted load-balancing parallel Apriori algorithm for association rule mining. In Proceedings of IEEE International Conference on Granular Computing, 756--761. DOI= http://dx.doi.org/10.1109/GRC.2008.4664768.Google Scholar
Lee, W. and Stolfo, S.J., 1998. Data mining approaches for intrusion detection. In Proceedings of the Proceedings of the 7th conference on USENIX Security Symposium (San Antonio, TX1998), USENIX Association, 1267555, 6--6. Google ScholarDigital Library
Li, L. and Zhang, M., 2011. The strategy of mining association rule based on cloud computing. In Proceedings of the International Conference on Business Computing and Global Informatization IEEE, 475--478. Google ScholarDigital Library
Lin, M.-Y., Lee, P.-Y., and Hsueh, S.-C., 2012. Apriori-based frequent itemset mining algorithms on MapReduce. In Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication ACM, Kuala Lumpur, Malaysia, 1--8. DOI= http://dx.doi.org/10.1145/2184751.2184842. Google ScholarDigital Library
Liu, L., LI, E., Zhang, Y., and Tang, Z., 2007. Optimization of frequent itemset mining on multiple-core processor. In Proceedings of the 33rd international conference on Very large data bases VLDB Endowment, 1275--1285. Google ScholarDigital Library
Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., and Byers, A.H., 2011. Big data: the next frontier for innovation, competition, and productivity. Tech. Rep. McKinsey Global Institute.Google Scholar
Mobasher, B., Dai, H., Luo, T., and Nakagawa, M., 2001. Effective personalization based on association rule discovery from web usage data. In Proceedings of the Proceedings of the 3rd international workshop on Web information and data management (Atlanta, Georgia, USA2001), ACM, 502935, 9--15. DOI= http://dx.doi.org/10.1145/502932.502935. Google ScholarDigital Library
Moens, S., Aksehirli, E., and Goethals, B., 2013. Frequent Itemset Mining for Big Data. In Proceedings of IEEE International Conference on Big Data IEEE, 111--118.Google Scholar
Navarro, G. and Providel, E., 2012. Fast, small, simple rank/select on bitmaps. In Proceedings of the Proceedings of the 11th international conference on Experimental Algorithms (Bordeaux, France2012), Springer-Verlag, 2366713, 295--306. DOI= http://dx.doi.org/10.1007/978-3-642-30850-5_26. Google ScholarDigital Library
Ning, L., LI, Z., Qing, H., and Zhongzhi, S., 2012. Parallel Implementation of Apriori Algorithm Based on MapReduce. In Proceedings of the 13th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing, 236--241. DOI= http://dx.doi.org/10.1109/SNPD.2012.31. Google ScholarDigital Library
O'neil, P.E., 1989. Model 204 Architecture and Performance. In Proceedings of the Proceedings of the 2nd International Workshop on High Performance Transaction Systems (1989), Springer-Verlag, 658338, 40--59. Google ScholarDigital Library
Oruganti, S., Ding, Q., and Tabrizi, N., 2013. Exploring HADOOP as a Platform for Distributed Association Rule Mining. In Proceedings of the 5th International Conference on Future Computational Technologies and Applications, 62--67.Google Scholar
Ozkural, E., Ucar, B., and Aykanat, C., 2011. Parallel frequent item set mining with selective item replication. IEEE Trans. Parallel Distrib. Syst. 22, (10), 1632--1640. Google ScholarDigital Library
Paul, S. and Saravanan, V., 2008. Hash partitioned Apriori in parallel and distributed data mining environment with dynamic data allocation approach. In Proceedings of the International Conference on Computer Science and Information Technology IEEE, 481--485. Google ScholarDigital Library
Qureshi, Z., Bansal, J., and Bansal, S., 2013. A survey on association rule mining in cloud computing. IJETAE 3, (4), 318--321.Google Scholar
Shah, K.D. and Mahajan, S., 2009. Maximizing the Efficiency of Parallel Apriori Algorithm. In Proceedings of the International Conference on Advances in Recent Technologies in Communication and Computing, 107--109. DOI= http://dx.doi.org/10.1109/ARTCom.2009.73. Google ScholarDigital Library
Srivastava, J., Cooley, R., Deshpande, M., and Tan, P.-N., 2000. Web usage mining: discovery and applications of usage patterns from Web data. SIGKDD Explor. Newsl. 1, (2), 12--23. DOI= http://dx.doi.org/10.1145/846183.846188. Google ScholarDigital Library
Vavilapalli, V.K., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S., Saha, B., Curino, C., O'Malley, O., Radia, S., Reed, B., and Baldeschwieler, E., 2013. Apache Hadoop YARN: yet another resource negotiator. In Proceedings of the Proceedings of the 4th annual Symposium on Cloud Computing (Santa Clara, California2013), ACM, 2523633, 1--16. DOI= http://dx.doi.org/10.1145/2523616.2523633. Google ScholarDigital Library
Wu, K., Otoo, E., and Shoshani, A., 2006. Optimizing Bitmap Indices with Efficient Compression. ACM T. DATABASE SYST. 31, (1), 1--38. Google ScholarDigital Library
Yahya, O., Hegazy, O., and Ezat, E., 2012. An Efficient Implementation of Apriori Algorithm Based on Hadoop-Mapreduce Model. IJRIC 12, (7), 59--67.Google Scholar
Yanbin, Y. and Chia-Chu, C., 2006. A Parallel Apriori Algorithm for Frequent Itemsets Mining. In Proceedings of the 4th International Conference on Software Engineering Research, Management and Applications (SERA'06), 87--94. DOI= http://dx.doi.org/10.1109/SERA.2006.6. Google ScholarDigital Library
Yang, X.Y., Liu, Z., and Fu, Y., 2010. MapReduce as a programming model for association rules algorithm on Hadoop. In Proceedings of the 3rd International Conference on Information Sciences and Interaction Sciences IEEE, 99--102.Google Scholar
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., and Stoica, I., 2010. Spark: cluster computing with working sets. In Proceedings of the Proceedings of the 2nd USENIX conference on Hot topics in cloud computing (Boston, MA2010), USENIX Association, 1863113, 10--10. Google ScholarDigital Library
Zaki, M.J., 1999. Parallel and Distributed Association mining: A survey. IEEE CONCURRENCY 7, (4), 14--25. DOI= http://dx.doi.org/10.1109/4434.806975. Google ScholarDigital Library

Recommendations

FPGA/GPU-based Acceleration for Frequent Itemsets Mining: A Comprehensive Review
In data mining, Frequent Itemsets Mining is a technique used in several domains with notable results. However, the large volume of data in modern datasets increases the processing time of Frequent Itemset Mining algorithms, making them unsuitable for many ...
Read More
Mining of frequent itemsets with JoinFI-mine algorithm
AIKED'11: Proceedings of the 10th WSEAS international conference on Artificial intelligence, knowledge engineering and data bases

Association rule mining among frequent items has been widely studied in data mining field. Many researches have improved the algorithm for generation of all the frequent itemsets. In this paper, we proposed a new algorithm to mine all frequents itemsets ...
Read More
Distributed Mining of Maximal Frequent Itemsets on a Data Grid System

In this paper, we propose a new algorithm, named Grid-based Distributed Max-Miner (GridDMM), for mining maximal frequent itemsets from databases on a Data Grid. A frequent itemset is maximal if none of its supersets is frequent. GridDMM is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

INFOS '16: Proceedings of the 10th International Conference on Informatics and Systems
May 2016
347 pages
ISBN:9781450340625
DOI:10.1145/2908446

Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 May 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Bitmaps
Distributed Data Mining
Frequent Itemsets Mining
Hadoop
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 88
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Compressed Bitmaps Based Frequent Itemsets Mining on Hadoop

INFOS '16: Proceedings of the 10th International Conference on Informatics and Systems

ABSTRACT

References

Cited By

Recommendations

FPGA/GPU-based Acceleration for Frequent Itemsets Mining: A Comprehensive Review

Mining of frequent itemsets with JoinFI-mine algorithm

Distributed Mining of Maximal Frequent Itemsets on a Data Grid System

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Compressed Bitmaps Based Frequent Itemsets Mining on Hadoop

INFOS '16: Proceedings of the 10th International Conference on Informatics and Systems

ABSTRACT

References

Cited By

Recommendations

FPGA/GPU-based Acceleration for Frequent Itemsets Mining: A Comprehensive Review

Mining of frequent itemsets with JoinFI-mine algorithm

Distributed Mining of Maximal Frequent Itemsets on a Data Grid System

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media