Abstract
With advances in technology, massive amounts of valuable data can be collected and transmitted at high velocity in various scientific, biomedical, and engineering applications. Hence, scalable data analytics tools are in demand for analyzing these data. For example, scalable tools for association analysis help reveal frequently occurring patterns and their relationships, which in turn lead to intelligent decisions. While a majority of existing frequent pattern mining algorithms (e.g., FP-growth) handle only precise data, there are situations in which data are uncertain. In recent years, tree-based algorithms for mining uncertain data (e.g., UF-growth, UFP-growth) have been developed. However, tree structures corresponding to these algorithms can be large. Other tree structures for handling uncertain data may achieve compactness at the expense of loose upper bounds on expected supports. In this paper, we propose (i) a compact tree structure that captures uncertain data with tighter upper bounds than aforementioned tree structures and (ii) a scalable data analytics algorithm that mines frequent patterns from our tree structure. Experimental results show the tightness of bounds to expected supports provided by our algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: VLDB 1994, pp. 487–499 (1994)
Aggarwal, C.C., Li, Y., Wang, J., Wang, J.: Frequent pattern mining with uncertain data. In: ACM KDD 2009, pp. 29–37 (2009)
Calders, T., Garboni, C., Goethals, B.: Efficient pattern mining of uncertain data with sampling. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010, Part I. LNCS (LNAI), vol. 6118, pp. 480–487. Springer, Heidelberg (2010)
Chui, C.-K., Kao, B., Hung, E.: Mining frequent itemsets from uncertain data. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 47–58. Springer, Heidelberg (2007)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: ACM SIGMOD 2000, pp. 1–12 (2000)
Huan, J.: Frequent graph patterns. In: Liu, L., Tamer Özsu, M. (eds.) Encyclopedia of Database Systems, pp. 1170–1175. Springer, New York (2009)
Jiang, F., Leung, C.K.-S., MacKinnon, R.K.: BigSAM: mining interesting patterns from probabilistic databases of uncertain Big data. In: Peng, W.-C., Wang, H., Bailey, J., Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P. (eds.) PAKDD 2014 Workshops. LNCS (LNAI), vol. 8643, pp. 774–786. Springer, Heidelberg (2014)
Lakshmanan, L.V.S., Leung, C.K.-S., Ng, R.T.: Efficient dynamic mining of constrained frequent sets. ACM TODS 28(4), 337–389 (2003)
Leung, C.K.-S.: Mining uncertain data. WIREs Data Mining Knowl. Discov. 1(4), 316–329 (2011)
Leung, C.K.-S., Hao, B.: Mining of frequent itemsets from streams of uncertain data. In: IEEE ICDE 2009, pp. 1663–1670 (2009)
Leung, C.K.-S., Mateo, M.A.F., Brajczuk, D.A.: A tree-based approach for frequent pattern mining from uncertain data. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 653–661. Springer, Heidelberg (2008)
Leung, C.K.-S., Tanbeer, S.K.: Fast tree-based mining of frequent itemsets from uncertain data. In: Lee, S., Peng, Z., Zhou, X., Moon, Y.-S., Unland, R., Yoo, J. (eds.) DASFAA 2012, Part I. LNCS, vol. 7238, pp. 272–287. Springer, Heidelberg (2012)
Leung, C.K.-S., Tanbeer, S.K.: PUF-tree: a compact tree structure for frequent pattern mining of uncertain data. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013, Part I. LNCS (LNAI), vol. 7818, pp. 13–25. Springer, Heidelberg (2013)
Tong, Y., Chen, L., Cheng, Y., Yu, P.S.: Mining frequent itemsets over uncertain databases. PVLDB 5(11), 1650–1661 (2012)
Wu, C., Yang, H., Zhu, J., Zhang, J., King, I., Lyu, M.R.: Sparse Poisson coding for high dimensional document clustering. In: IEEE BigData Conference 2013, pp. 512–517 (2013)
Yang, H., Lyu, M.R., King, I.: Efficient online learning for multitask feature selection. ACM TKDD 7(2), art. 6 (2013)
Acknowledgments
This project is partially supported by NSERC (Canada) and University of Manitoba.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
MacKinnon, R.K., Leung, C.KS., Tanbeer, S.K. (2014). A Scalable Data Analytics Algorithm for Mining Frequent Patterns from Uncertain Data. In: Peng, WC., et al. Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2014. Lecture Notes in Computer Science(), vol 8643. Springer, Cham. https://doi.org/10.1007/978-3-319-13186-3_37
Download citation
DOI: https://doi.org/10.1007/978-3-319-13186-3_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13185-6
Online ISBN: 978-3-319-13186-3
eBook Packages: Computer ScienceComputer Science (R0)