Skip to main content

BigSAM: Mining Interesting Patterns from Probabilistic Databases of Uncertain Big Data

  • Conference paper
  • First Online:
Book cover Trends and Applications in Knowledge Discovery and Data Mining (PAKDD 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8643))

Included in the following conference series:

Abstract

Nowadays, high volumes of valuable uncertain data can be easily collected or generated at high velocity in many real-life applications. Mining these uncertain Big data is computationally intensive due to the presence of existential probability values associated with items in every transaction in the uncertain data. Each existential probability value expresses the likelihood of that item to be present in a particular transaction in the Big data. In some situations, users may be interested in mining all frequent patterns from these uncertain Big data; in other situations, users may be interested in only a tiny portion of these mined patterns. To reduce the computation and to focus the mining for the latter situations, we propose a tree-based algorithm that (i) allows users to express the patterns to be mined according to their intention via the use of constraints and (ii) uses MapReduce to mine uncertain Big data for only those frequent patterns that satisfy user-specified constraints. Experimental results show the effectiveness of our algorithm in mining probabilistic databases of uncertain Big data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: VLDB 1994, pp. 487–499 (1994)

    Google Scholar 

  2. Calders, T., Garboni, C., Goethals, B.: Efficient pattern mining of uncertain data with sampling. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010, Part I. LNCS, vol. 6118, pp. 480–487. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  3. Chen, Y.-C., Ko, Y.-L., Peng, W.-C., Lee, W.-C.: Mining appliance usage patterns in smart home environment. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013, Part I. LNCS (LNAI), vol. 7818, pp. 99–110. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  4. Chui, C.-K., Kao, B., Hung, E.: Mining frequent itemsets from uncertain data. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 47–58. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  5. Condie, T., Mineiro, P., Polyzotis, N., Weimer, M.: Machine learning for Big data. In: ACM SIGMOD 2013, pp. 939–942 (2013)

    Google Scholar 

  6. Cordeiro, R.L.F., Traina, C., Traina, A.J.M., López, J., Kang, U., Faloutsos, C.: Clustering very large multi-dimensional datasets with MapReduce. In: ACM KDD 2011, pp. 690–698 (2011)

    Google Scholar 

  7. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. CACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  8. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: ACM SIGMOD 2000, pp. 1–12 (2000)

    Google Scholar 

  9. Koufakou, A., Secretan, J., Reeder, J., Cardona, K., Georgiopoulos, M.: Fast parallel outlier detection for categorical datasets using MapReduce. In: IEEE IJCNN 2008, pp. 3298–3304 (2008)

    Google Scholar 

  10. Kumar, A., Niu, F., Ré, C.: Hazy: making it easier to build and maintain big-data analytics. CACM 56(3), 40–49 (2013)

    Article  Google Scholar 

  11. Leung, C.K.-S.: Frequent itemset mining with constraints. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems, pp. 1179–1183. Springer, New York (2009)

    Google Scholar 

  12. Leung, C.K.S.: Big data mining and analytics. In: Wang, J. (ed.) Encyclopedia of Business Analytics and Optimization, pp. 328–337. IGI Global, Hershey (2014)

    Chapter  Google Scholar 

  13. Leung, C.K.-S.: Mining uncertain data. WIREs Data Min. Knowl. Discov. 1(4), 316–329 (2011)

    Article  Google Scholar 

  14. Leung, C.K.-S., Hayduk, Y.: Mining frequent patterns from uncertain data with mapreduce for big data analytics. In: Meng, W., Feng, L., Bressan, S., Winiwarter, W., Song, W. (eds.) DASFAA 2013, Part I. LNCS, vol. 7825, pp. 440–455. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  15. Leung, C.K.-S., Mateo, M.A.F., Brajczuk, D.A.: A tree-based approach for frequent pattern mining from uncertain data. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 653–661. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  16. Leung, C.K.-S., Tanbeer, S.K.: PUF-Tree: a compact tree structure for frequent pattern mining of uncertain data. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013, Part I. LNCS (LNAI), vol. 7818, pp. 13–25. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  17. Lin, M.-Y., Lee, P.-Y., Hsueh, S.-C.: Apriori-based frequent itemset mining algorithms on MapReduce. In: ICUIMC 2012, Article 76 (2012)

    Google Scholar 

  18. Luo, W., Chan, K.C.C.: Discovering patterns in drug-protein interactions based on their fingerprints. BMC Bioinform. 13(S–9), S4 (2012)

    Google Scholar 

  19. MacKinnon, R.K., Leung, C.K.-S., Tanbeer, S.K.: A scalable data analytics algorithm for mining frequent patterns from uncertain data. In: Peng, W.-C., Wang, H., Bailey, J., Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P. (eds.) PAKDD 2014 Workshops, LNCS (LNAI), vol. 8643, pp. 404-416. Springer, Heidelberg (2014)

    Google Scholar 

  20. Madden, S.: From databases to big data. IEEE Internet Comput. 16(3), 4–6 (2012)

    Article  Google Scholar 

  21. Ng, R.T., Ng, Lakshmanan, L.V.S., Han, J., Pang, A.: Exploratory mining and pruning optimizations of constrained associations rules. In: ACM SIGMOD 1998, pp. 13–24 (1998)

    Google Scholar 

  22. Riondato, M., DeBrabant, J.A., Fonseca, R., Upfal, E.: PARMA: a parallel randomized algorithm for approximate association rules mining in MapReduce. In: ACM CIKM 2012, pp. 85–94 (2012)

    Google Scholar 

  23. Tang, L.-Y., Hsiu, P.-C., Huang, J.-L., Chen, M.-S.: iLauncher: an intelligent launcher for mobile apps based on individual usage patterns. In: ACM SAC 2013, pp. 505–512 (2013)

    Google Scholar 

  24. Yang, S., Wang, B., Zhao, H., Wu, B.: Efficient dense structure mining using MapReduce. In: IEEE ICDM Workshops 2009, pp. 332–337 (2009)

    Google Scholar 

  25. Zaki, M.J.: Parallel and distributed association mining: a survey. IEEE Concurrency 7(4), 14–25 (1999)

    Article  Google Scholar 

Download references

Acknowledgments

This project is partially supported by NSERC (Canada) and University of Manitoba.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carson Kai-Sang Leung .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Jiang, F., Leung, C.KS., MacKinnon, R.K. (2014). BigSAM: Mining Interesting Patterns from Probabilistic Databases of Uncertain Big Data. In: Peng, WC., et al. Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2014. Lecture Notes in Computer Science(), vol 8643. Springer, Cham. https://doi.org/10.1007/978-3-319-13186-3_70

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-13186-3_70

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-13185-6

  • Online ISBN: 978-3-319-13186-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics