Finding efficiencies in frequent pattern mining from big uncertain data

Leung, Carson Kai-Sang; MacKinnon, Richard Kyle; Jiang, Fan

doi:10.1007/s11280-016-0411-3

Finding efficiencies in frequent pattern mining from big uncertain data

Published: 06 September 2016

Volume 20, pages 571–594, (2017)
Cite this article

World Wide Web Aims and scope Submit manuscript

Carson Kai-Sang Leung¹,
Richard Kyle MacKinnon¹ &
Fan Jiang¹

814 Accesses
12 Citations
Explore all metrics

Abstract

Many existing data mining algorithms search interesting patterns from transactional databases of precise data. However, there are situations in which data are uncertain. Items in each transaction of these probabilistic databases of uncertain data are usually associated with existential probabilities, which express the likelihood of these items to be present in the transaction. When compared with mining from precise data, the search space for mining from uncertain data is much larger due to the presence of the existential probabilities. This problem is worsened as we are moving to the era of Big data. Furthermore, in many real-life applications, users may be interested in a tiny portion of this large search space for Big data mining. Without providing opportunities for users to express the interesting patterns to be mined, many existing data mining algorithms return numerous patterns—out of which only some are interesting. In this article, we propose an algorithm that allows users to express their interest in terms of constraints, uses the MapReduce model to mine uncertain Big data for frequent patterns that satisfy the user-specified anti-monotone and monotone constraints, as well as balance the load.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agarwal, P., Shroff, G., Malhotra, P.: Approximate incremental big-data harmonization. In: IEEE Big Data Congress, pp. 118–125 (2013)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: VLDB, pp. 487–499 (1994)
Azzini, A., Ceravolo, P.: Consistent process mining over Big data triple stores. In: IEEE Big Data Congress, pp. 54–61 (2013)
Can, F., Ozkarahan, E.A.: Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases. ACM TODS 15(4), 483–517 (1990)
Article Google Scholar
Condie, T., Mineiro, P., Polyzotis, N., Weimer, M.: Machine learning for Big data. In: ACM SIGMOD, pp. 939–942 (2013)
Cordeiro, R.L.F., Traina Jr, C., Traina, A.J.M., López, J., Kang, U., Faloutsos, C.: Clustering very large multi-dimensional datasets with MapReduce. In: ACM KDD, pp. 690–698 (2011)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. CACM 51(1), 107–113 (2008)
Article Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: ACM SIGMOD, pp. 1–12 (2000)
Koufakou, A., Secretan, J., Reeder, J., Cardona, K., Georgiopoulos, M.: Fast parallel outlier detection for categorical datasets using MapReduce. In: IEEE IJCNN, pp. 3298–3304 (2008)
Kumar, A., Niu, F., Ré, C.: Hazy: making it easier to build and maintain Big-data analytics. CACM 56(3), 40–49 (2013)
Article Google Scholar
Lakshmanan, L.V.S., Leung, C.K.-S., Ng, R.T.: Efficient dynamic mining of constrained frequent sets. ACM TODS 28(4), 337–389 (2003)
Article Google Scholar
Lee, S., Jo, S., Kim, J.: MRDataCube: data cube computation using MapReduce. In: BigComp, pp. 95–102 (2015)
Leung, C.K.-S.: Frequent itemset mining with constraints. In: Encyclopedia of Database Systems, pp. 1179–1183 (2009)
Leung, C.K.-S.: Uncertain frequent pattern mining. In: Frequent Pattern Mining, pp. 417–453 (2014)
Leung, C.K.-S., Cuzzocrea, A., Jiang, F.: Discovering frequent patterns from uncertain data streams with time-fading and landmark models. Transactions on Large-Scale Data- and Knowledge-Centered Systems 8, 174–196 (2013)
Google Scholar
Leung, C.K.-S., Jiang, F.: Big data analytics of social networks for the discovery of ‘following’ patterns. In: DaWaK, pp. 123–135 (2015)
Leung, C.K.-S., Lakshmanan, L.V.S., Ng, R.T.: Exploiting succinct constraints using FP-trees. ACM SIGKDD Explorations 4(1), 40–49 (2002)
Article Google Scholar
Leung, C.K.-S., Mateo, M.A.F., Brajczuk, D.A.: A tree-based approach for frequent pattern mining from uncertain data. In: PAKDD, pp. 653–661 (2008)
Leung, C.K.-S., MacKinnon, R.K., Jiang, F.: Reducing the search space for Big data mining for interesting patterns from uncertain data. In: IEEE Big Data Congress, pp. 315–322 (2014)
Leung, C.K.-S., Tanbeer, S.K.: Fast tree-based mining of frequent itemsets from uncertain data. In: DASFAA, pp. 272–287 (2012)
Leung, C.K.-S., Tanbeer, S.K.: PUF-tree: A compact tree structure for frequent pattern mining of uncertain data. In: PAKDD, pp. 13–25 (2013)
Lin, M.-Y., Lee, P.-Y., Hsueh, S.-C.: Apriori-based frequent itemset mining algorithms on MapReduce. In: ICUIMC, art. 76 (2012)
Madden, S.: From databases to big data. IEEE Internet Comput. 16(3), 4–6 (2012)
Article Google Scholar
Ng, R.T., Lakshmanan, L.V.S., Han, J., Pang, A.: Exploratory mining and pruning optimizations of constrained associations rules. In: ACM SIGMOD, pp. 13–24 (1998)
Ölmezoğullari, E., Ari, I.: Online association rule mining over fast data. In: IEEE Big Data Congress 2013, pp. 110–117 (2013)
Pei, T., Sobolevsky, S., Ratti, C., Shaw, S.-L., Li, T., Zhou, C.: A new insight into land use classification based on aggregated mobile phone data. Int. J. Geogr. Inf. Sci. 28(9), 1988–2007 (2014)
Article Google Scholar
Riondato, M., DeBrabant, J., Fonseca, R., Upfal, E.: PARMA: a parallel randomized algorithm for approximate association rules mining in MapReduce. In: ACM CIKM, pp. 85–94 (2012)
Sobolevsky, S., Sitko, I., Tachet des Combes, R., Hawelka, B., Arias, J. M., Ratti, C.: Money on the move: Big data of bank card transactions as the new proxy for human mobility patterns and regional delineation. The case of residents and foreign visitors in Spain. In: IEEE Big Data Congress, pp. 136–143 (2014)
Song, M.: Exploring concept graphs for biomedical literature mining. In: BigComp 2015, pp. 103–110
Tong, Y., Chen, L., Cheng, Y., Yu, P.S.: Mining frequent itemsets over uncertain databases. PVLDB 5(11), 1650–1661 (2012)
Google Scholar
Xin, J., Wang, Z., Chen, C., Ding, L., Wang, G., Zhao, Y.: ELM : distributed extreme learning machine with MapReduce. World Wide Web 17, 1189–1204 (2014)
Article Google Scholar
Yang, H., Fong, S.: Countering the concept-drift problem in big data using iOVFDT. In: IEEE Big Data Congress, pp. 126–132 (2013)
Yang, S., Wang, B., Zhao, H., Wu, B.: Efficient dense structure mining using MapReduce. In: IEEE ICDM Workshops, pp. 332–337 (2009)
Zaki, M.J.: Parallel and distributed association mining: a survey. IEEE Concurr. 7(4), 14–25 (1999)
Article Google Scholar
Zeng, C., Lu, Z., Wang, J., Hung, P.C.K., Tian, J.: Variable granularity index on massive service processes. In: IEEE ICWS, pp. 18–25 (2013)

Download references

Acknowledgements

This project is partially supported by NSERC (Canada) and the University of Manitoba.

Author information

Authors and Affiliations

Department of Computer Science, University of Manitoba, Winnipeg, MB, R3T 2N2, Canada
Carson Kai-Sang Leung, Richard Kyle MacKinnon & Fan Jiang

Authors

Carson Kai-Sang Leung
View author publications
You can also search for this author in PubMed Google Scholar
Richard Kyle MacKinnon
View author publications
You can also search for this author in PubMed Google Scholar
Fan Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carson Kai-Sang Leung.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Leung, C.KS., MacKinnon, R.K. & Jiang, F. Finding efficiencies in frequent pattern mining from big uncertain data. World Wide Web 20, 571–594 (2017). https://doi.org/10.1007/s11280-016-0411-3

Download citation

Received: 29 August 2015
Revised: 14 August 2016
Accepted: 22 August 2016
Published: 06 September 2016
Issue Date: May 2017
DOI: https://doi.org/10.1007/s11280-016-0411-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Finding efficiencies in frequent pattern mining from big uncertain data

Abstract

Access this article

Similar content being viewed by others

BigSAM: Mining Interesting Patterns from Probabilistic Databases of Uncertain Big Data

Mining Frequent Patterns from Uncertain Data with MapReduce for Big Data Analytics

An Efficient MapReduce-Based Apriori-Like Algorithm for Mining Frequent Itemsets from Big Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Finding efficiencies in frequent pattern mining from big uncertain data

Abstract

Access this article

Similar content being viewed by others

BigSAM: Mining Interesting Patterns from Probabilistic Databases of Uncertain Big Data

Mining Frequent Patterns from Uncertain Data with MapReduce for Big Data Analytics

An Efficient MapReduce-Based Apriori-Like Algorithm for Mining Frequent Itemsets from Big Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation