On the design of hardware-software architectures for frequent itemsets mining on data streams

Bustio-Martínez, Lázaro; Cumplido, René; Hernández-León, Raudel; Bande-Serrano, José M.; Feregrino-Uribe, Claudia

doi:10.1007/s10844-017-0461-8

On the design of hardware-software architectures for frequent itemsets mining on data streams

Published: 16 May 2017

Volume 50, pages 415–440, (2018)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Lázaro Bustio-Martínez ORCID: orcid.org/0000-0002-0273-0520¹,
René Cumplido¹,
Raudel Hernández-León²,
José M. Bande-Serrano² &
…
Claudia Feregrino-Uribe¹

574 Accesses
8 Citations
Explore all metrics

Abstract

Frequent Itemsets Mining has been applied in many data processing applications with remarkable results. Recently, data streams processing is gaining a lot of attention due to its practical applications. Data in data streams are transmitted at high rates and cannot be stored for offline processing making impractical to use traditional data mining approaches (such as Frequent Itemsets Mining) straightforwardly on data streams. In this paper, two single-pass parallel algorithms based on a tree data structure for Frequent Itemsets Mining on data streams are proposed. The presented algorithms employ Landmark and Sliding Window Models for windows handling. In the presented paper, as in other revised papers, if the number of frequent items on data streams is low then the proposed algorithms perform an exact mining process. On the contrary, if the number of frequent patterns is large the mining process is approximate with no false positives produced. Experiments conducted demonstrate that the presented algorithms outperform the processing time of the hardware architectures reported in the state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Big data analytics on Apache Spark

Article 13 October 2016

Stratified random sampling from streaming and stored data

Article 23 October 2020

A Comprehensive Survey of Anomaly Detection Algorithms

Article 26 November 2021

Notes

A systolic tree is an arrangement of pipelined processing elements in a multidimensional tree pattern.
http://www.cs.loyola.edu/cgiannel/assoc_gen.html

References

Aggarwal, C., & Han, J. (2014). Frequent pattern mining. Springer International Publishing.
Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In Proceedings of the 20th international conference on very large data bases VLDB ’94 (pp. 487–499). San Francisco.
Babcock, B., Babu, S., Datar, M., Motwani, R., & Widom, J. (2002). Models and issues in Data Stream systems. In Proceedings of the 21th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS ’02 (pp. 1–16). New York: ACM.
Bai-En, S., Philip, S., & Vincent, S. (2012). Efficient algorithms for mining maximal high utility itemsets from Data Streams with different models. Expert Systems with Applications, 39(17), 12,947–12,960.
Article Google Scholar
Baker, Z., & Prasanna, V. (2005). Efficient hardware Data Mining with the Apriori algorithm on FPGAs. In Proceedings of the 13th annual IEEE symposium on field-programmable custom computing machines, FCCM ’05 (pp. 3–12). Washington: IEEE Computer Society.
Baker, Z., & Prasanna, V. (2006). An architecture for efficient hardware data mining using reconfigurable computing systems. In 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2006. FCCM ’06 (pp. 67–75).
Baralis, E., Cerquitelli, T., Chiusano, S., Grand, A., & Grimaudo, L. (2011). An Efficient Itemset Mining Approach for Data Streams. In Konig, A., Dengel, A., Hinkelmann, K., Kise, K., Howlett, R., & Jain, L. (Eds.) Knowlege-based and intelligent information and engineering systems, lecture notes in computer science Vol. 6882 pp 515–523. Berlin: Springer.
Bustio, L., Cumplido, R., Hernández, R., & Bande, J. M. (2015). Feregrino, C.: A hardware-based approach for Frequent Itemset Mining in Data Streams. In Proceedings of the 4th workshop on new frontiers in mining complex patterns (nFCPM2015) held in conjunction with PKDD2015 (pp. 14–26). Portugal: Porto.
Cameron, J., Cuzzocrea, A., Jiang, F., & Leung, C. (2013). Mining Frequent Itemsets from sparse data streams in limited memory environments. In Web-age information management, lecture notes in computer science (Vol. 7923 pp. 51–57). Berlin: Springer.
Cheng, J., Ke, Y., & Ng, W. (2008). A survey on algorithms for mining frequent itemsets over data streams. Knowledge and Information Systems, 16(1), 1–27.
Article Google Scholar
Compton, K., & Hauck, S. (2002). Reconfigurable computing: a survey of systems and software. ACM Computing Surveys (csuR), 34(2), 171–210.
Article Google Scholar
Cormode, G., & Hadjieleftheriou, M. (2009). Finding the Frequent Items in streams of data. Communications of the ACM, 52(10), 97–105.
Article Google Scholar
Giannella, C., Han, J., Pei, J., Yan, X., & Yu, P. (2003). Mining frequent patterns in Data Streams at multiple time granularities. Next Generation Data Mining, 212, 191–212.
Google Scholar
Golab, L., & Özsu, T. (2003). Data Stream Management Issues–A Survey. Tech. rep., Apr. 2003 https://cs.uwaterloo.ca/tozsu/ddbms/publications/stream/streamsurvey.ps.
Han, J., Pei, J., & Yin, Y. (2000). Mining frequent patterns without candidate generation. In Proceedings of the 2000 ACM SIGMOD international conference on management of data, SIGMOD ’00 (pp. 1–12). New York: ACM.
Jiang, N., & Gruenwald, L. (2006). Research issues in Data Stream association rule mining. SIGMOD Record, 35(1), 14–19.
Article Google Scholar
Jin, R., & Agrawal, G. (2007). Frequent Pattern Mining in Data Streams. In Data Streams, advanced in database systems (Vol. 31 pp. 61–84). Springer.
Lai, Y., Wang, N., Chou, T., Lee, C., Wellem, T., & Nugroho, H. (2010). Implementing on-line sketch-based change detection on a NETFPGA platform. In 1st Asia netFPGA developers workshop.
Lee, W., Stolfo, S., & Mok, K. (2000). Adaptive intrusion detection: A data mining approach. Artificial Intelligence Review, 14(6), 533–567.
Article MATH Google Scholar
Lichman, M. (2013). UCI Machine learning repository. http://archive.ics.uci.edu/ml. Accessed: 2015-06-20.
Manku, G. S., & Motwani, R. (2002). Approximate frequency counts over Data Streams. In Proceedings of the 28th international conference on very large data bases, VLDB ’02 (pp. 346–357). VLDB endowment.
Mesa, A., Feregrino-Uribe, C., Cumplido, R., & Hernández-Palancar, J. (2010). A Highly Parallel Algorithm for Frequent Itemset Mining. In Advanced in pattern recognition, lecture notes in computer science, vol. 6256, (pp. 291–300). Berlin: Springer.
Metwally, A., Agrawal, D., & Abbadi, A. (2006). An integrated efficient solution for computing frequent and top-k elements in Data Streams. ACM Transactions Database Systems, 31(3), 1095–1133.
Article Google Scholar
Metwally, A., Agrawal, D., & Abbadi, A. E. (2005). Efficient computation of frequent and top-k elements in Data Streams. In Database Theory - ICDT 2005, no. 3363 in lecture notes in computer science, (pp. 398–412). Berlin: Springer.
Shaobo, S., Yue, Q., & Qin, W. (2013). Accelerating intersection computation in Frequent Itemset Mining with FPGA. In 2013 IEEE 10th International conference on embedded and ubiquitous computing - HPCC-EUC high performance computing and communications 2013 (pp. 659–665).
Song, S., Steffen, M., & Zambreno, J. (2008). A reconfigurable platform for Frequent Pattern Mining. In International conference on reconfigurable computing and FPGAs, 2008. reconfig ’08 (pp. 55–60).
Song, S., & Zambreno, J. (2008). Mining association rules with Systolic Trees. In International conference on field programmable logic and applications, 2008. FPL 2008 (pp. 143–148).
Song, S., & Zambreno, J. (2011). Design and Analysis of a Reconfigurable Platform for Frequent Pattern Mining. IEEE Transactions on Parallel and Distributed Systems, 22(9), 1497–1505.
Article Google Scholar
Sun, Y., Wang, Z., Huang, S., Wang, L., Wang, Y., Luo, R., & Yang, H. (2014). Accelerating Frequent Item Counting With FPGA. In Proceedings of the 2014 ACM/SIGDA international symposium on field-programmable gate arrays, FPGA ’14 (pp. 109–112). New York: ACM.
Teubner, J., Müller, R., & Alonso, G. (2010). FPGA Acceleration for the Frequent Item Problem. In F. Li, M.M. Moro, S. Ghandeharizadeh, J.R. Haritsa, G. Weikum, M.J. Carey, F. Casati, E.Y. Chang, I. Manolescu, S. Mehrotra, U. Dayal, V.J. Tsotras (Eds.) 2010 IEEE 26th International Conference on Data Engineering (ICDE) (pp. 669–680). IEEE.
Teubner, J., & Müller, R. (2011). Alonso, G.: Frequent Item Computation on a Chip. IEEE Transactions on Knowledge and Data Engineering, 23(8), 1169–1181.
Article Google Scholar
Thanh, L., & Calders, T. (2010). Mining Top-k Frequent Items in a Data Stream with Flexible Sliding Windows. In Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining KDD ’10 (pp. 283–292). New York: ACM.
Thöni, D., & Strey, A. (2009). Novel Strategies for Hardware Acceleration of Frequent Itemset Mining with the Apriori Algorithm. In FPL 2009. International Conference on Field programmable logic and applications, 2009 (pp. 489–492).
Tong, D., & Prasanna, V. (2013). Online Heavy Hitter Detector on FPGA. In 2013 International conference on reconfigurable computing and FPGAs (reconfig), (pp. 1–6). IEEE.
Wen, Y., Huang, J., & M.S., C. (2008). Hardware-Enhanced Association Rule Mining with Hashing and Pipelining. IEEE Transactions on Knowledge and Data Engineering, 20(6), 784–795.
Article Google Scholar
Yamamoto, K., Ikebe, M., T., A., & Motomura, M. (2016). FPGA-Based Stream Processing for Frequent Itemset Mining with Incremental Multiple Hashes. Circuits and System, 7(10), 3299–3309.
Article Google Scholar
Zaki, M. (2000). Scalable Algorithms for Association Mining. IEEE Transactions on Knowledge and Data Engineering, 12(3), 372–390.
Article Google Scholar
Zhang, Y., Zhang, F., Jin, Z., & Bakos, J. (2013). An FPGA-Based Accelerator for Frequent Itemset Mining. ACM Transactions Reconfigurable Technology Systems, 6(1), 2:1–2:17.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Sciences Department, National Institute for Astrophysics, Optics, and Electronics, Luis Enrique Erro ♯ 1, Sta. María Tonantzintla, CP: 72840, Puebla, México
Lázaro Bustio-Martínez, René Cumplido & Claudia Feregrino-Uribe
Advanced Technologies Application Center, 7a ♯ 21406, Siboney, Playa, CP: 12200, Havana, Cuba
Raudel Hernández-León & José M. Bande-Serrano

Authors

Lázaro Bustio-Martínez
View author publications
You can also search for this author in PubMed Google Scholar
René Cumplido
View author publications
You can also search for this author in PubMed Google Scholar
Raudel Hernández-León
View author publications
You can also search for this author in PubMed Google Scholar
José M. Bande-Serrano
View author publications
You can also search for this author in PubMed Google Scholar
Claudia Feregrino-Uribe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lázaro Bustio-Martínez.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bustio-Martínez, L., Cumplido, R., Hernández-León, R. et al. On the design of hardware-software architectures for frequent itemsets mining on data streams. J Intell Inf Syst 50, 415–440 (2018). https://doi.org/10.1007/s10844-017-0461-8

Download citation

Received: 07 June 2016
Revised: 30 March 2017
Accepted: 12 April 2017
Published: 16 May 2017
Issue Date: June 2018
DOI: https://doi.org/10.1007/s10844-017-0461-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the design of hardware-software architectures for frequent itemsets mining on data streams

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

Stratified random sampling from streaming and stored data

A Comprehensive Survey of Anomaly Detection Algorithms

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the design of hardware-software architectures for frequent itemsets mining on data streams

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

Stratified random sampling from streaming and stored data

A Comprehensive Survey of Anomaly Detection Algorithms

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation