Efficient Mining of Weighted Frequent Itemsets in Uncertain Databases

Lin, Jerry Chun-Wei; Gan, Wensheng; Fournier-Viger, Philippe; Hong, Tzung-Pei

doi:10.1007/978-3-319-41920-6_18

Efficient Mining of Weighted Frequent Itemsets in Uncertain Databases

Jerry Chun-Wei Lin¹⁴,
Wensheng Gan¹⁴,
Philippe Fournier-Viger¹⁵ &
…
Tzung-Pei Hong^16,17

Conference paper
First Online: 28 June 2016

3055 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9729))

Abstract

Frequent itemset mining (FIM) is a fundamental set of techniques used to discover useful and meaningful relationships between items in transaction databases. Recently, extensions of FIM such as weighted frequent itemset mining (WFIM) and frequent itemset mining in uncertain databases (UFIM) have been proposed. WFIM considers that items may have different weight/importance, and the UFIM takes into account that data collected in a real-life environment may often be inaccurate, imprecise, or incomplete. Recently, a two-phase Apriori-based approach called HEWI-Uapriori was proposed to consider both item weight and uncertainty to mine the high expected weighted itemsets (HEWIs), while it generates a large amount of candidates and is too time-consuming. In this paper, a more efficient algorithm named HEWI-Utree is developed to efficiently mine HEWIs without performing multiple database scans and without generating enormous candidates. It relies on three novel structures named element (E)-table, weighted-probability (WP)-table and WP-tree to maintain the information required for identifying and pruning unpromising itemsets early. Experimental results show that the proposed algorithm is efficient than traditional methods of WFIM and UFIM, as well as the HEWI-Uapriori algorithm, in terms of runtime, memory usage, and scalability.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, C.C., Li, Y., Wang, J., Wang, J.: Frequent pattern mining with uncertain data. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 29–38 (2009)
Google Scholar
Aggarwal, C.C., Yu, P.S.: A survey of uncertain data algorithms and applications. IEEE Transactions on Knowledge and Data Engineering. 21(5), 609–623 (2009)
Article Google Scholar
Agrawal, R., Srikant, R.: Quest synthetic data generator. http://www.Almaden.ibm.com/cs/quest/syndata.html
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: The International Conference on Very Large Data Bases, pp. 487–499 (1994)
Google Scholar
Bernecker, T., Kriegel, H.P., Renz, M., Verhein, F., Zuefl, A.: Probabilistic frequent itemset mining in uncertain databases. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 119–128 (2009)
Google Scholar
Cai, C.H., Fu, A.W.C., Kwong, W.W.: Mining association rules with weighted items. In: The International Conference on Database Engineering and Applications Symposium, pp. 68–77 (1998)
Google Scholar
Chui, C.-K., Kao, B., Hung, E.: Mining Frequent Itemsets from Uncertain Data. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 47–58. Springer, Heidelberg (2007)
Chapter Google Scholar
Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining and Knowledge Discovery. 8(1), 53–97 (2004)
Article MathSciNet Google Scholar
Lin, J.C.W., Gan, W., Fournier-Viger, P., Hong, T.P.: RWFIM: Recent weighted-frequent itemsets mining. Engineering Applications of Artificial Intelligence. 45, 18–32 (2015)
Article Google Scholar
Lin, J.C.W., Gan, W., Fournier-Viger, P., Hong, T.P., Tseng, V.S.: Weighted frequent itemset mining over uncertain databases. Applied Intelligence. 44(1), 166–178 (2016)
Article Google Scholar
Lan, G.C., Hong, T.P., Lee, H.Y., Lin, C.W.: Mining weighted frequent itemsets. The Workshop on Combinatorial Mathematics and Computation Theory, pp. 85–89 (2013)
Google Scholar
Lan, G.C., Hong, T.P., Lee, H.Y.: An efficient approach for finding weighted sequential patterns from sequence databases. Applied Intelligence. 41, 439–452 (2014)
Article Google Scholar
Rymon, R.: Search through systematic set enumeration. In: The International Conference on Principles of Knowledge Representation and Reasoning, pp. 539–550 (1992)
Google Scholar
SPMF: A Java Open-Source Data Mining Library. http://www.philippe-fournier-viger.com/spmf/
Sun, L., Cheng, R., Cheung, D.W., Cheng, J.: Mining uncertain data with probabilistic guarantees. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 273–282 (2010)
Google Scholar
Sun, K., Bai, F.: Mining weighted association rules without preassigned weights. IEEE Transactions on Knowledge and Data Engineering. 20, 489–495 (2008)
Article Google Scholar
Tao, F., Murtagh, F., Farid, M.: Weighted association rule mining using weighted support and significance framework. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 661–666 (2003)
Google Scholar
Vo, B., Coenen, F., Le, B.: A new method for mining frequent weighted itemsets based on wit-trees. Expert Systems with Applications. 40, 1256–1264 (2013)
Article Google Scholar
Wang, W., Yang, J., Yu, P.S.: Efficient mining of weighted association rules (WAR). In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 270–274 (2000)
Google Scholar
Yun, U., Leggett, J.: WFIM: Weighted frequent itemset mining with a weight range and a minimum weight. In: SIAM International Conference on Data Mining, pp. 636–640 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China
Jerry Chun-Wei Lin & Wensheng Gan
School of Natural Sciences and Humanities, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China
Philippe Fournier-Viger
Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan
Tzung-Pei Hong
Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan
Tzung-Pei Hong

Authors

Jerry Chun-Wei Lin
View author publications
You can also search for this author in PubMed Google Scholar
Wensheng Gan
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Fournier-Viger
View author publications
You can also search for this author in PubMed Google Scholar
Tzung-Pei Hong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jerry Chun-Wei Lin .

Editor information

Editors and Affiliations

IBaI, Inst of Comp Vision and applied Comp Sci, Leipzig, Sachsen, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lin, J.CW., Gan, W., Fournier-Viger, P., Hong, TP. (2016). Efficient Mining of Weighted Frequent Itemsets in Uncertain Databases. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2016. Lecture Notes in Computer Science(), vol 9729. Springer, Cham. https://doi.org/10.1007/978-3-319-41920-6_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-41920-6_18
Published: 28 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41919-0
Online ISBN: 978-3-319-41920-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics