Skip to main content
Log in

Stream mining on univariate uncertain data

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In this paper, we propose mining frequent patterns from univariate uncertain data streams, which have a quantitative interval for each attribute in a transaction and a probability density function indicating the possibilities that the values in the interval appear. Many data streams comprise flows of univariate uncertain data, for example, the records of atmospheric pollution sensors, and network monitoring records. We propose two algorithms to address this issue: the ExactU2Stream algorithm and the ApproxiU2Stream algorithm. The former incrementally stores the incoming transactions, and delays the mining process until it is requested. The latter mines the transactions immediately when they arrive, and stores the derived frequent patterns. Compared with the latter, the former returns results that are more accurate, but it also requires more response time. Both algorithms utilize the sliding window scheme, which decomposes the continuous data stream into discrete, overlapping chunks. The proposed algorithms outperform the compared methods in terms of runtime and memory usage. We have applied the two proposed algorithms to the data streams recording the air quality in Taiwan; the derived frequent patterns not only show the common air quality in Taiwan but also show the extremely bad air quality when a sand storm affects Taiwan.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30

Similar content being viewed by others

References

  1. Abd-Elmegid LA, El-Sharkawi ME, El-Fangary LM, Helmy YK (2010) Vertical mining of frequent patterns from uncertain data. Comput Inf Sci 3:171–179

    Google Scholar 

  2. Aggarwal CC, Han J, Yu PS (2004) On demand classification of data streams. In: Proc ACM SIGKDD int conf knowledge discovery and data mining, pp 503–508

    Google Scholar 

  3. Aggarwal CC, Li Y, Wang J, Wang J (2009) Frequent pattern mining with uncertain data. In: Proc int conf knowledge discovery and data mining, pp 29–37

    Google Scholar 

  4. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proc int conf very large data base, pp 487–499

    Google Scholar 

  5. Ahmed CF, Tanbeer SK, Jeong BS, Lee YK (2011) HUC-Prune: an efficient candidate pruning technique to mine high utility patterns. Appl Intell 34:181–198

    Article  Google Scholar 

  6. Chang JH, Lee WS (2004) A sliding window method for finding recently frequent itemsets over online data streams. J Inf Sci Eng 20:753–762

    Google Scholar 

  7. Charikar M, Chen K, Farach-Colton M (2002) Finding frequent items in data streams. In: Proc int conf automata, languages, and programming, pp 693–703

    Chapter  Google Scholar 

  8. Chu CJ, Tseng VS, Liang T (2008) An efficient algorithm for mining temporal high utility itemsets from data streams. J Syst Softw 81:1105–1117

    Article  Google Scholar 

  9. Chui C, Kao B (2008) A decremental approach for mining frequent itemsets from uncertain data. In: Proc Pacific-Asia conference on knowledge discovery and data mining, pp 64–75

    Chapter  Google Scholar 

  10. Chui C, Kao B, Hung E (2007) Mining frequent itemsets from uncertain data. In: Proc Pacific-Asia conference on knowledge discovery and data mining, pp 47–58

    Chapter  Google Scholar 

  11. Chi Y, Wang H, Yu PS, Muntz RR (2004) Moment: maintaining closed frequent itemsets over a stream sliding window. In: Proc int conf data mining, pp 59–66

    Google Scholar 

  12. Cormode G, Muthukrishnan S (2003) What’s hot and what’s not: tracking most frequent items dynamically. In: Proc SIGMOD/PODS, pp 296–306

    Google Scholar 

  13. Gaber MM, Krishnaswamy S, Zaslavsky A (2005) Onboard mining of data streams in sensor networks. In: Maulik U (ed) Advanced methods of knowledge discovery from complex data. Springer, Berlin, pp 307–335

    Chapter  Google Scholar 

  14. Giannella C, Han J, Pei J, Yan X, Yu PS (2003) Mining frequent patterns in data streams at multiple time granularities. In: Kargupta H (ed) Data mining: next generation challenges and future directions. AAAI Press/MIT Press, Melno Park/Cambridge, pp 191–210

    Google Scholar 

  15. Golab L, Dehaan D, Demaine ED, Lopez-Ortiz A, Munro JI (2003) Identifying frequent items in sliding windows over on-line packet streams. In: Proc internet measurement conference, pp 173–178

    Chapter  Google Scholar 

  16. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proc ACM SIGMOD int conf management of data, pp 1–12

    Google Scholar 

  17. Hung CC, Peng WC (2011) A regression-based approach for mining user movement patterns from random sample data. Data Knowl Eng 70:1–20

    Article  Google Scholar 

  18. Jiang N, Gruenwald L (2006) Research issues in data stream association rule mining. SIGMOD Rec 35:14–19

    Article  Google Scholar 

  19. Jiang N, Gruenwald L (2006) CFI-stream: mining closed frequent itemsets in data streams. In: Proc int conf knowledge discovery and data mining, pp 592–597

    Google Scholar 

  20. Karp RM, Shenker S (2003) A simple algorithm for finding frequent elements in streams and bags. ACM Trans Database Syst 28:51–55

    Article  Google Scholar 

  21. Lee CH (2007) IMSP: an information theoretic approach for multi-dimensional sequential pattern mining. Appl Intell 26:231–242

    Article  MATH  Google Scholar 

  22. Leung CKS, Carmichael CL, Hao B (2007) Efficient mining of frequent patterns from uncertain data. In: Proc int conf data mining—workshops, pp 489–494

    Google Scholar 

  23. Leung CKS, Hao B (2009) Mining of frequent itemsets from streams of uncertain data. In: Proc int conf data engineering, pp 1663–1670

    Google Scholar 

  24. Leung CKS, Hao B, Jiang F (2010) Constrained frequent itemset mining from uncertain data streams. In: Proc int conf data engineering workshops, pp 120–127

    Google Scholar 

  25. Leung CKS, Khan QI (2006) DSTree: a tree structure for the mining of frequent sets from data streams. In: Proc int conf data mining, pp 928–933

    Google Scholar 

  26. Leung CKS, Mateo MAF, Brajczuk DA (2008) A tree-based approach for frequent pattern mining from uncertain data. In: Proc Pacific-Asia conference on knowledge discovery and data mining, pp 653–661

    Chapter  Google Scholar 

  27. Li CW, Jea KF, Lin RP, Yen SF, Hsu CW (2012) Mining frequent patterns from dynamic data streams with data load management. J Syst Softw 85:1346–1362

    Article  Google Scholar 

  28. Li HF, Lee SY (2009) Mining frequent itemsets over data streams using efficient window sliding techniques. Expert Syst Appl 36:1466–1477

    Article  Google Scholar 

  29. Li HF, Lee SY, Shan MK (2004) An efficient algorithm for mining frequent itemsets over the entire history of data streams. In: Proc int work knowledge discovery in data streams

    Google Scholar 

  30. Li HF, Lee SY, Shan MK (2005) Online mining (recently) maximal frequent itemsets over data streams. In: Proc int work research issues in data engineering: stream data mining and applications

    Google Scholar 

  31. Lin CH, Chiu DY, Wu YH (2005) Mining frequent itemsets from data streams with a time-sensitive sliding window. In: Proc SIAM int conf data mining

    Google Scholar 

  32. Liu YH (2012) Mining frequent patterns from univariate uncertain data. Data Knowl Eng 71:47–68

    Article  Google Scholar 

  33. Liu YH, Wang CS (2012) Constrained frequent pattern mining on univariate uncertain data. J Syst Softw. doi:10.1016/j.jss.2012.11.020

    Google Scholar 

  34. Manku GS, Motwani R (2002) Approximate frequency counts over data streams. In: Proc int conf very large data bases, pp 346–357

    Chapter  Google Scholar 

  35. Mao G, Wu X, Zhu X, Chen G, Liu C (2007) Mining maximal frequent itemsets from data streams. J Inf Sci 33:251–262

    Article  Google Scholar 

  36. Purwanto EC, Logeswaran R (2012) An enhanced hybrid method for time series prediction using linear and neural network models. Appl Intell 37:511–519

    Article  Google Scholar 

  37. Qiao S, Tang C, Jin H, Long T, Dai S, Ku Y, Chau M (2010) PutMode: prediction of uncertain trajectories in moving objects databases. Appl Intell 33:370–386

    Article  Google Scholar 

  38. Silvestri C, Orlando S (2007) Approximate mining of frequent patterns on streams. Intell Data Anal 11:49–73

    Google Scholar 

  39. Sun L, Cheng R, Cheung DW, Cheng J (2010) Mining uncertain data with probabilistic guarantees. In: Proc ACM SIGKDD int conf knowledge discovery and data mining, pp 273–282

    Google Scholar 

  40. Wang YT, Cheng JT (2011) Mining periodic movement patterns of mobile phone users based on an efficient sampling approach. Appl Intell 35:32–40

    Article  Google Scholar 

  41. Yang L, Sanver M (2004) Mining short association rules with one database scan. Proc information and knowledge engineering

  42. Yu JX, Chong Z, Lu H, Zhang Z, Zhou A (2006) A false negative approach to mining frequent itemsets from high speed transactional data streams. Inf Sci 176:1986–2015

    Article  Google Scholar 

  43. EPA website (2010). http://taqm.epa.gov.tw/taqm/zh-tw/default.aspx

  44. Xu C, Wang Y, Gu Y, Lin S, Yu G (2012) Efficient fuzzy ranking queries in uncertain databases. Appl Intell 37:47–59

    Article  Google Scholar 

  45. Zhao L, Wang L, Xu Q (2012) Data stream classification with artificial endocrine system. Appl Intell 37:390–404

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying-Ho Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, YH. Stream mining on univariate uncertain data. Appl Intell 39, 315–344 (2013). https://doi.org/10.1007/s10489-012-0415-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-012-0415-3

Keywords

Navigation