Skip to main content
Log in

Mining Frequent Itemsets in Correlated Uncertain Databases

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Recently, with the growing popularity of Internet of Things (IoT) and pervasive computing, a large amount of uncertain data, e.g., RFID data, sensor data, real-time video data, has been collected. As one of the most fundamental issues of uncertain data mining, uncertain frequent pattern mining has attracted much attention in database and data mining communities. Although there have been some solutions for uncertain frequent pattern mining, most of them assume that the data is independent, which is not true in most real-world scenarios. Therefore, current methods that are based on the independent assumption may generate inaccurate results for correlated uncertain data. In this paper, we focus on the problem of mining frequent itemsets over correlated uncertain data, where correlation can exist in any pair of uncertain data objects (transactions). We propose a novel probabilistic model, called Correlated Frequent Probability model (CFP model) to represent the probability distribution of support in a given correlated uncertain dataset. Based on the distribution of support derived from the CFP model, we observe that some probabilistic frequent itemsets are only frequent in several transactions with high positive correlation. In particular, the itemsets, which are global probabilistic frequent, have more significance in eliminating the influence of the existing noise and correlation in data. In order to reduce redundant frequent itemsets, we further propose a new type of patterns, called global probabilistic frequent itemsets, to identify itemsets that are always frequent in each group of transactions if the whole correlated uncertain database is divided into disjoint groups based on their correlation. To speed up the mining process, we also design a dynamic programming solution, as well as two pruning and bounding techniques. Extensive experiments on both real and synthetic datasets verify the effectiveness and efficiency of the proposed model and algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Böhm C, Gruber M, Kunath P, Pryakhin A, Schubert M. ProVer: Probabilistic video retrieval using the Gauss-tree. In Proc. the 23rd ICDE, April 2007, pp.1521-1522.

  2. Chen L, Ng R T. On the marriage of Lp-norms and edit distance. In Proc. the 30th VLDB, August 31-September 3, 2004, pp.792-803.

  3. Chen L, Özsu M T, Oria V. Robust and fast similarity search for moving object trajectories. In Proc. ACM SIGMOD, June 2005, pp.491-502.

  4. Cheng R, Kalashnikov D V, Prabhakar S. Querying imprecise data in moving object environments. IEEE Trans. Knowl. Data Eng., 2004, 16(9): 1112-1127.

    Article  Google Scholar 

  5. Deshpande A, Guestrin C, Madden S, Hellerstein J M, Hong W. Model-driven data acquisition in sensor networks. In Proc. the 30th VLDB, August 31-September 3, 2004, pp.588-599.

  6. Kodialam M S, Nandagopal T. Fast and reliable estimation schemes in RFID systems. In Proc. the 12th MOBICOM, September 2006, pp.322-333.

  7. Liu Y, Liu K, Li M. Passive diagnosis for wireless sensor networks. IEEE/ACM Trans. Netw., 2010, 18(4): 1132-1144.

    Article  Google Scholar 

  8. Chui C K, Kao B, Hung E. Mining frequent itemsets from uncertain data. In Proc. the 11th PAKDD, May 2007, pp.47-58.

  9. Chui C K, Kao B. A decremental approach for mining frequent itemsets from uncertain data. In Proc. the 12th PAKDD, May 2008, pp.64-75.

  10. Calders T, Garboni C, Goethals B. Efficient pattern mining of uncertain data with sampling. In Proc. the 14th PAKDD, June 2010, pp.480-487.

  11. Aggarwal C C, Li Y, Wang J, Wang J. Frequent pattern mining with uncertain data. In Proc. the 15th SIGKDD, June 28–July 1, 2009, pp.29-38.

  12. Bernecker T, Kriegel H P, Renz M, Verhein F, Züfle A. Probabilistic frequent itemset mining in uncertain databases. In Proc. the 15th SIGKDD, June 28–July 1, 2009, pp.119-128.

  13. Calders T, Garboni C, Goethals B. Approximation of frequentness probability of itemsets in uncertain data. In Proc. the 10th ICDM, December 2010, pp.749-754.

  14. Gao C, Wang J. Direct mining of discriminative patterns for classifying uncertain data. In Proc. the 16th SIGKDD, July 2010, pp.861-870.

  15. Leung C K S, Mateo M A F, Brajczuk D A. A tree-based approach for frequent pattern mining from uncertain data. In Proc. the 12th PAKDD, May 2008, pp.653-661.

  16. Sun L, Cheng R, Cheung D W, Cheng J. Mining uncertain data with probabilistic guarantees. In Proc. the 16th SIGKDD, July 2010, pp.273-282.

  17. Tong Y, Chen L, Ding B. Discovering threshold-based frequent closed itemsets over probabilistic data. In Proc. the 28th ICDE, April 2012, pp.270-281.

  18. Tong Y, Chen L, Cheng Y, Yu P S. Mining frequent itemsets over uncertain databases. PVLDB, 2014, 5(11): 1650-1661.

    Google Scholar 

  19. Wang L, Cheng R, Lee S D, Cheung D W. Accelerating probabilistic frequent itemset mining: A model-based approach. In Proc. the 19th CIKM, October 2010, pp.429-438.

  20. Zhang Q, Li F, Yi K. Finding frequent items in probabilistic data. In Proc. ACM SIGMOD, June 2008, pp.819-832.

  21. Schoute F. Dynamic frame length ALOHA. IEEE Trans. Communications, 1983, 31(4): 565-568.

    Article  Google Scholar 

  22. Lancaster H O. The Chi-Squared Distribution. New York, USA: Wiley, 1969.

  23. Agrawal R, Srikant R. Fast algorithms for mining association rules in large databases. In Proc. the 20th VLDB, September 1994, pp.487-499.

  24. van Dongen S M. Graph clustering by flow simulation [Ph.D. Thesis]. University of Utrecht, 2000.

  25. Mo L, He Y, Liu Y, Zhao J, Tang S, Li X Y, Dai G. Canopy closure estimates with GreenOrbs: Sustainable sensing in the forest. In Proc. the 7th SenSys, November 2009, pp.99-112.

  26. Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. In Proc. ACM SIGMOD, May 2000, pp.1-12.

  27. Zaki M J. Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng., 2000, 12(3): 372-390.

    Article  Google Scholar 

  28. Pasquier N, Bastide Y, Taouil R, Lakhal L. Discovering frequent closed itemsets for association rules. In Proc. the 7th ICDT, January 1999, pp.398-416.

  29. Bayardo R J. Efficiently mining long patterns from databases. In Proc. ACM SIGMOD, June 1998, pp.85-93.

  30. Calders T, Goethals B. Mining all non-derivable frequent itemsets. In Proc. the 6th PKDD, August 2002, pp.74-85.

  31. Sen P, Deshpande A. Representing and querying correlated tuples in probabilistic databases. In Proc. the 23rd ICDE, April 2007, pp.596-605.

  32. Sen P, Deshpande A, Getoor L. Exploiting shared correlations in probabilistic databases. PVLDB, 2008, 1(1): 809-820.

    Google Scholar 

  33. Kanagal B, Deshpande A. Efficient query evaluation over temporally correlated probabilistic streams. In Proc. the 25th ICDE, March 29–April 2, 2009, pp.1315-1318.

  34. Olteanu D, van Schaik S J. Dagger: Clustering correlated uncertain data (to predict asset failure In energy networks). In Proc. the 18th SIGKDD, August 2012, pp.1504-1507.

  35. Kanagal B, Deshpande A. Indexing correlated probabilistic databases. In Proc. ACM SIGMOD, June 2009, pp.455-468.

  36. Gu Y, Gao C, Cong G, Yu G. Effective and efficient clustering methods for correlated probabilistic graphs. IEEE Trans. Knowl. Data Eng., 2014, 26(5): 1117-1130.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Chen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tong, YX., Chen, L. & She, J. Mining Frequent Itemsets in Correlated Uncertain Databases. J. Comput. Sci. Technol. 30, 696–712 (2015). https://doi.org/10.1007/s11390-015-1555-9

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-015-1555-9

Keywords

Navigation