Abstract
The problem of mining frequent itemsets from uncertain data (uFIM) has attracted attention in recent years. Most of the work in this field is based on the assumption of stochastic independence, which is clearly unjustified in many real-world applications of uFIM. To address this problem, we introduce a new general model for expressing dependencies in frequent itemset mining. We show that mining itemsets in the general model is NP-complete, but give an efficient algorithm based on dynamic programming to mine itemsets in a simplified version of this model. Our experimental results show that assuming independence in correlated data sets leads to substantially incorrect results.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Mathematically, we assume that \(t_A.Music\) and \(t_B.Music\) are independent of \(t_A.Game\) and \(t_B.Video\).
- 2.
\(Jaccard(A,B) = \frac{|A \cap B|}{|A \cup B|} \text{, } \text{ where } \text{ A } \text{ and } \text{ B } \text{ arenot } \text{ empty } \text{ and }\, 0 \le Jaccard(A,B) \le 1\).
References
Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)
Benjelloun, O.,Sarma, A.D., Halevy, A., Widom, J.: ULDBs: databases with uncertainty and lineage. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 953–964. VLDB Endowment (2006)
Bernecker, T., Kriegel, H.-P., Renz, M., Verhein, F., Zuefle, A.: Probabilistic frequent itemset mining in uncertain databases. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 119–128. ACM (2009)
Chui, C.-K., Kao, B., Hung, E.: Mining frequent itemsets from uncertain data. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 47–58. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71701-0_8
Gupta, R., Sarawagi, S.: Creating probabilistic databases from information extraction models. In: Proceedings of the International Conference on Very Large Data Bases, vol. 32, pp. 965. Citeseer (2006)
Lancaster, H.O., Seneta, E.: Chi-square distribution. Wiley Online Library (1969)
Li, Y., Bailey, J., Kulik, L., Pei, J.: Efficient matching of substrings in uncertain sequences. In: Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 767–775. SIAM (2014)
Sen, P., Deshpande, A.: Representing and querying correlated tuples in probabilistic databases. In: IEEE 23rd International Conference on Data Engineering 2007, ICDE 2007, pp. 596–605. IEEE (2007)
Suciu, D., Olteanu, D., Ré, C., Koch, C.: Probabilistic databases. Synth. Lect. Data Manag. 3(2), 1–180 (2011)
Tong, W., Leung, C.K., Liu, D., Yu, J.: Probabilistic frequent pattern mining by puh-mine. In: Cheng, R., Cui, B., Zhang, Z., Cai, R., Xu, J. (eds.) APWeb 2015. LNCS, vol. 9313, pp. 768–780. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25255-1_63
Tong, Y.-X., Chen, L., She, J.: Mining frequent itemsets in correlated uncertain databases. J. Comput. Sci. Technol. 30(4), 696–712 (2015)
Xie, D., Qin, Y., Sheng, Q.Z., Xu, Y.: Managing uncertainties in RFID applications-a survey. In: 2014 IEEE 11th International Conference on e-Business Engineering (ICEBE), pp. 220–225. IEEE (2014)
Zhang, Q., Li, F., Yi, K.: Finding frequent items in probabilistic data. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 819–832. ACM (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Kalaz, Y.A., Raman, R. (2018). Frequent Itemset Mining on Correlated Probabilistic Databases. In: Hartmann, S., Ma, H., Hameurlain, A., Pernul, G., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2018. Lecture Notes in Computer Science(), vol 11030. Springer, Cham. https://doi.org/10.1007/978-3-319-98812-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-98812-2_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98811-5
Online ISBN: 978-3-319-98812-2
eBook Packages: Computer ScienceComputer Science (R0)