Skip to main content
Log in

Finding efficiencies in frequent pattern mining from big uncertain data

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Many existing data mining algorithms search interesting patterns from transactional databases of precise data. However, there are situations in which data are uncertain. Items in each transaction of these probabilistic databases of uncertain data are usually associated with existential probabilities, which express the likelihood of these items to be present in the transaction. When compared with mining from precise data, the search space for mining from uncertain data is much larger due to the presence of the existential probabilities. This problem is worsened as we are moving to the era of Big data. Furthermore, in many real-life applications, users may be interested in a tiny portion of this large search space for Big data mining. Without providing opportunities for users to express the interesting patterns to be mined, many existing data mining algorithms return numerous patterns—out of which only some are interesting. In this article, we propose an algorithm that allows users to express their interest in terms of constraints, uses the MapReduce model to mine uncertain Big data for frequent patterns that satisfy the user-specified anti-monotone and monotone constraints, as well as balance the load.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3

Similar content being viewed by others

References

  1. Agarwal, P., Shroff, G., Malhotra, P.: Approximate incremental big-data harmonization. In: IEEE Big Data Congress, pp. 118–125 (2013)

  2. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: VLDB, pp. 487–499 (1994)

  3. Azzini, A., Ceravolo, P.: Consistent process mining over Big data triple stores. In: IEEE Big Data Congress, pp. 54–61 (2013)

  4. Can, F., Ozkarahan, E.A.: Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases. ACM TODS 15(4), 483–517 (1990)

    Article  Google Scholar 

  5. Condie, T., Mineiro, P., Polyzotis, N., Weimer, M.: Machine learning for Big data. In: ACM SIGMOD, pp. 939–942 (2013)

  6. Cordeiro, R.L.F., Traina Jr, C., Traina, A.J.M., López, J., Kang, U., Faloutsos, C.: Clustering very large multi-dimensional datasets with MapReduce. In: ACM KDD, pp. 690–698 (2011)

  7. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. CACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  8. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: ACM SIGMOD, pp. 1–12 (2000)

  9. Koufakou, A., Secretan, J., Reeder, J., Cardona, K., Georgiopoulos, M.: Fast parallel outlier detection for categorical datasets using MapReduce. In: IEEE IJCNN, pp. 3298–3304 (2008)

  10. Kumar, A., Niu, F., Ré, C.: Hazy: making it easier to build and maintain Big-data analytics. CACM 56(3), 40–49 (2013)

    Article  Google Scholar 

  11. Lakshmanan, L.V.S., Leung, C.K.-S., Ng, R.T.: Efficient dynamic mining of constrained frequent sets. ACM TODS 28(4), 337–389 (2003)

    Article  Google Scholar 

  12. Lee, S., Jo, S., Kim, J.: MRDataCube: data cube computation using MapReduce. In: BigComp, pp. 95–102 (2015)

  13. Leung, C.K.-S.: Frequent itemset mining with constraints. In: Encyclopedia of Database Systems, pp. 1179–1183 (2009)

  14. Leung, C.K.-S.: Uncertain frequent pattern mining. In: Frequent Pattern Mining, pp. 417–453 (2014)

  15. Leung, C.K.-S., Cuzzocrea, A., Jiang, F.: Discovering frequent patterns from uncertain data streams with time-fading and landmark models. Transactions on Large-Scale Data- and Knowledge-Centered Systems 8, 174–196 (2013)

    Google Scholar 

  16. Leung, C.K.-S., Jiang, F.: Big data analytics of social networks for the discovery of ‘following’ patterns. In: DaWaK, pp. 123–135 (2015)

  17. Leung, C.K.-S., Lakshmanan, L.V.S., Ng, R.T.: Exploiting succinct constraints using FP-trees. ACM SIGKDD Explorations 4(1), 40–49 (2002)

    Article  Google Scholar 

  18. Leung, C.K.-S., Mateo, M.A.F., Brajczuk, D.A.: A tree-based approach for frequent pattern mining from uncertain data. In: PAKDD, pp. 653–661 (2008)

  19. Leung, C.K.-S., MacKinnon, R.K., Jiang, F.: Reducing the search space for Big data mining for interesting patterns from uncertain data. In: IEEE Big Data Congress, pp. 315–322 (2014)

  20. Leung, C.K.-S., Tanbeer, S.K.: Fast tree-based mining of frequent itemsets from uncertain data. In: DASFAA, pp. 272–287 (2012)

  21. Leung, C.K.-S., Tanbeer, S.K.: PUF-tree: A compact tree structure for frequent pattern mining of uncertain data. In: PAKDD, pp. 13–25 (2013)

  22. Lin, M.-Y., Lee, P.-Y., Hsueh, S.-C.: Apriori-based frequent itemset mining algorithms on MapReduce. In: ICUIMC, art. 76 (2012)

  23. Madden, S.: From databases to big data. IEEE Internet Comput. 16(3), 4–6 (2012)

    Article  Google Scholar 

  24. Ng, R.T., Lakshmanan, L.V.S., Han, J., Pang, A.: Exploratory mining and pruning optimizations of constrained associations rules. In: ACM SIGMOD, pp. 13–24 (1998)

  25. Ölmezoğullari, E., Ari, I.: Online association rule mining over fast data. In: IEEE Big Data Congress 2013, pp. 110–117 (2013)

  26. Pei, T., Sobolevsky, S., Ratti, C., Shaw, S.-L., Li, T., Zhou, C.: A new insight into land use classification based on aggregated mobile phone data. Int. J. Geogr. Inf. Sci. 28(9), 1988–2007 (2014)

    Article  Google Scholar 

  27. Riondato, M., DeBrabant, J., Fonseca, R., Upfal, E.: PARMA: a parallel randomized algorithm for approximate association rules mining in MapReduce. In: ACM CIKM, pp. 85–94 (2012)

  28. Sobolevsky, S., Sitko, I., Tachet des Combes, R., Hawelka, B., Arias, J. M., Ratti, C.: Money on the move: Big data of bank card transactions as the new proxy for human mobility patterns and regional delineation. The case of residents and foreign visitors in Spain. In: IEEE Big Data Congress, pp. 136–143 (2014)

  29. Song, M.: Exploring concept graphs for biomedical literature mining. In: BigComp 2015, pp. 103–110

  30. Tong, Y., Chen, L., Cheng, Y., Yu, P.S.: Mining frequent itemsets over uncertain databases. PVLDB 5(11), 1650–1661 (2012)

    Google Scholar 

  31. Xin, J., Wang, Z., Chen, C., Ding, L., Wang, G., Zhao, Y.: ELM : distributed extreme learning machine with MapReduce. World Wide Web 17, 1189–1204 (2014)

    Article  Google Scholar 

  32. Yang, H., Fong, S.: Countering the concept-drift problem in big data using iOVFDT. In: IEEE Big Data Congress, pp. 126–132 (2013)

  33. Yang, S., Wang, B., Zhao, H., Wu, B.: Efficient dense structure mining using MapReduce. In: IEEE ICDM Workshops, pp. 332–337 (2009)

  34. Zaki, M.J.: Parallel and distributed association mining: a survey. IEEE Concurr. 7(4), 14–25 (1999)

    Article  Google Scholar 

  35. Zeng, C., Lu, Z., Wang, J., Hung, P.C.K., Tian, J.: Variable granularity index on massive service processes. In: IEEE ICWS, pp. 18–25 (2013)

Download references

Acknowledgements

This project is partially supported by NSERC (Canada) and the University of Manitoba.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carson Kai-Sang Leung.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Leung, C.KS., MacKinnon, R.K. & Jiang, F. Finding efficiencies in frequent pattern mining from big uncertain data. World Wide Web 20, 571–594 (2017). https://doi.org/10.1007/s11280-016-0411-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-016-0411-3

Keywords

Navigation