Frequent Itemset Mining on Correlated Probabilistic Databases

Kalaz, Yasemin Asan; Raman, Rajeev

doi:10.1007/978-3-319-98812-2_6

Frequent Itemset Mining on Correlated Probabilistic Databases

Yasemin Asan Kalaz¹⁸ &
Rajeev Raman¹⁸

Conference paper
First Online: 09 August 2018

1381 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11030))

Abstract

The problem of mining frequent itemsets from uncertain data (uFIM) has attracted attention in recent years. Most of the work in this field is based on the assumption of stochastic independence, which is clearly unjustified in many real-world applications of uFIM. To address this problem, we introduce a new general model for expressing dependencies in frequent itemset mining. We show that mining itemsets in the general model is NP-complete, but give an efficient algorithm based on dynamic programming to mine itemsets in a simplified version of this model. Our experimental results show that assuming independence in correlated data sets leads to substantially incorrect results.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Mathematically, we assume that \(t_A.Music\) and \(t_B.Music\) are independent of \(t_A.Game\) and \(t_B.Video\).
2.
\(Jaccard(A,B) = \frac{|A \cap B|}{|A \cup B|} \text{, } \text{ where } \text{ A } \text{ and } \text{ B } \text{ arenot } \text{ empty } \text{ and }\, 0 \le Jaccard(A,B) \le 1\).

References

Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)
Google Scholar
Benjelloun, O.,Sarma, A.D., Halevy, A., Widom, J.: ULDBs: databases with uncertainty and lineage. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 953–964. VLDB Endowment (2006)
Google Scholar
Bernecker, T., Kriegel, H.-P., Renz, M., Verhein, F., Zuefle, A.: Probabilistic frequent itemset mining in uncertain databases. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 119–128. ACM (2009)
Google Scholar
Chui, C.-K., Kao, B., Hung, E.: Mining frequent itemsets from uncertain data. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 47–58. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71701-0_8
Chapter Google Scholar
Gupta, R., Sarawagi, S.: Creating probabilistic databases from information extraction models. In: Proceedings of the International Conference on Very Large Data Bases, vol. 32, pp. 965. Citeseer (2006)
Google Scholar
Lancaster, H.O., Seneta, E.: Chi-square distribution. Wiley Online Library (1969)
Google Scholar
Li, Y., Bailey, J., Kulik, L., Pei, J.: Efficient matching of substrings in uncertain sequences. In: Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 767–775. SIAM (2014)
Google Scholar
Sen, P., Deshpande, A.: Representing and querying correlated tuples in probabilistic databases. In: IEEE 23rd International Conference on Data Engineering 2007, ICDE 2007, pp. 596–605. IEEE (2007)
Google Scholar
Suciu, D., Olteanu, D., Ré, C., Koch, C.: Probabilistic databases. Synth. Lect. Data Manag. 3(2), 1–180 (2011)
Article Google Scholar
Tong, W., Leung, C.K., Liu, D., Yu, J.: Probabilistic frequent pattern mining by puh-mine. In: Cheng, R., Cui, B., Zhang, Z., Cai, R., Xu, J. (eds.) APWeb 2015. LNCS, vol. 9313, pp. 768–780. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25255-1_63
Chapter Google Scholar
Tong, Y.-X., Chen, L., She, J.: Mining frequent itemsets in correlated uncertain databases. J. Comput. Sci. Technol. 30(4), 696–712 (2015)
Article MathSciNet Google Scholar
Xie, D., Qin, Y., Sheng, Q.Z., Xu, Y.: Managing uncertainties in RFID applications-a survey. In: 2014 IEEE 11th International Conference on e-Business Engineering (ICEBE), pp. 220–225. IEEE (2014)
Google Scholar
Zhang, Q., Li, F., Yi, K.: Finding frequent items in probabilistic data. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 819–832. ACM (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Leicester, Leicester, LE1 7RH, UK
Yasemin Asan Kalaz & Rajeev Raman

Authors

Yasemin Asan Kalaz
View author publications
You can also search for this author in PubMed Google Scholar
Rajeev Raman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yasemin Asan Kalaz .

Editor information

Editors and Affiliations

Clausthal University of Technology, Clausthal-Zellerfeld, Germany
Sven Hartmann
Victoria University of Wellington, Wellington, New Zealand
Hui Ma
Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
University of Regensburg, Regensburg, Germany
Günther Pernul
Johannes Kepler University, Linz, Austria
Roland R. Wagner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kalaz, Y.A., Raman, R. (2018). Frequent Itemset Mining on Correlated Probabilistic Databases. In: Hartmann, S., Ma, H., Hameurlain, A., Pernul, G., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2018. Lecture Notes in Computer Science(), vol 11030. Springer, Cham. https://doi.org/10.1007/978-3-319-98812-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-98812-2_6
Published: 09 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98811-5
Online ISBN: 978-3-319-98812-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics