skip to main content
10.1145/2452376.2452403acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

Mining frequent serial episodes over uncertain sequence data

Published: 18 March 2013 Publication History

Abstract

Data uncertainty has posed many unique challenges to nearly all types of data mining tasks, creating a need for uncertain data mining. In this paper, we focus on the particular task of mining probabilistic frequent serial episodes (P-FSEs) from uncertain sequence data, which applies to many real applications including sensor readings as well as customer purchase sequences. We first define the notion of P-FSEs, based on the frequentness probabilities of serial episodes under possible world semantics. To discover P-FSEs over an uncertain sequence, we propose: 1) an exact approach that computes the accurate frequentness probabilities of episodes; 2) an approximate approach that approximates the frequency of episodes using probability models; 3) an optimized approach that efficiently prunes a candidate episode by estimating an upper bound of its frequentness probability using approximation techniques.
We conduct extensive experiments to evaluate the performance of the developed data mining algorithms. Our experimental results show that: 1) while existing research demonstrates that approximate approaches are orders of magnitudes faster than exact approaches, for P-FSE mining, the efficiency improvement of the approximate approach over the exact approach is marginal; 2) although it has been recognized that the normal distribution based approximation approach is fairly accurate when the data set is large enough, for P-FSE mining, the binomial distribution based approximation achieves higher accuracy when the the number of episode occurrences is limited; 3) the optimized approach clearly outperforms the other two approaches in terms of the runtime, and achieves very high accuracy.

References

[1]
A. Achar, S. Laxman, and P. S. Sastry. A unified view of the apriori-based algorithms for frequent episode discovery. Knowl. Inf. Syst., 31(2):223--250, 2012.
[2]
C. C. Aggarwal, Y. Li, J. Wang, and J. Wang. Frequent pattern mining with uncertain data. In KDD, pages 29--38, 2009.
[3]
C. C. Aggarwal and P. S. Yu. A survey of uncertain data algorithms and applications. IEEE Trans. Knowl. Data Eng., 21(5):609--623, 2009.
[4]
R. Agrawal and R. Srikant. Mining sequential patterns. In ICDE, pages 3--14, 1995.
[5]
T. Bernecker, H.-P. Kriegel, M. Renz, F. Verhein, and A. Züfle. Probabilistic frequent itemset mining in uncertain databases. In KDD, pages 119--128, 2009.
[6]
J. Bi and T. Zhang. Support vector classification with input data uncertainty. In NIPS, 2004.
[7]
T. Calders, C. Garboni, and B. Goethals. Approximation of frequentness probability of itemsets in uncertain data. In ICDM, pages 749--754, 2010.
[8]
C. K. Chui, B. Kao, and E. Hung. Mining frequent itemsets from uncertain data. In PAKDD, pages 47--58, 2007.
[9]
K. Iwanuma, Y. Takano, and H. Nabeshima. A unified view of the apriori-based algorithms for frequent episode discovery. In IEEE Conference on Cybernetics and Intelligent Systems, pages 213--217, 2004.
[10]
G. Karypis, M. V. Joshi, and V. Kumar. A Universal formulation of sequential patterns. Technical Report TR99-021, Department of Computer Science, University of Minnesota, Minneapolis, 1999.
[11]
H.-P. Kriegel and M. Pfeifle. Density-based clustering of uncertain data. In KDD, pages 672--677, 2005.
[12]
S. Laxman. Discovering frequent episodes: fast algorithms, connections with HMMs and generalizations. PhD thesis, Banalore, India, 2006.
[13]
S. Laxman, P. S. Sastry, and K. P. Unnikrishnan. Discovering frequent episodes and learning hidden markov models: A formal connection. IEEE Trans. Knowl. Data Eng., 17(11):1505--1517, 2005.
[14]
S. Laxman, V. Tankasali, and R. W. White. Stream prediction using a generative model based on frequent episodes in event sequences. In KDD, pages 453--461, 2008.
[15]
C. K.-S. Leung, M. A. F. Mateo, and D. A. Brajczuk. A tree-based approach for frequent pattern mining from uncertain data. In PAKDD, pages 653--661, 2008.
[16]
H. Mannila, H. Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. Data Min. Knowl. Discov., 1(3):259--289, 1997.
[17]
M. Muzammal and R. Raman. Mining sequential patterns from probabilistic databases. In PAKDD (2), pages 210--221, 2011.
[18]
W. K. Ngai, B. Kao, C. K. Chui, R. Cheng, M. Chau, and K. Y. Yip. Efficient clustering of uncertain data. In ICDM, pages 436--445, 2006.
[19]
B. Qin, Y. Xia, S. Prabhakar, and Y.-C. Tu. A rule-based classification algorithm for uncertain data. In ICDE, pages 1633--1640, 2009.
[20]
J. Ren, S. D. Lee, X. Chen, B. Kao, R. Cheng, and D. W.-L. Cheung. Naive bayes classification of uncertain data. In ICDM, pages 944--949, 2009.
[21]
L. Sun, R. Cheng, D. W. Cheung, and J. Cheng. Mining uncertain data with probabilistic guarantees. In KDD, pages 273--282, 2010.
[22]
Y. Tong, L. Chen, Y. Cheng, and P. S. Yu. Mining frequent itemsets over uncertain databases. PVLDB, 5(11):1650--1661, 2012.
[23]
K. P. Unnikrishnan, B. Q. Shadid, P. S. Sastry, and S. Laxman. Root cause diagnostics using temporal data mining. Patent Number(s) US 7509234, 2009.
[24]
L. Wang, R. Cheng, S. D. Lee, and D. W.-L. Cheung. Accelerating probabilistic frequent itemset mining: a model-based approach. In CIKM, pages 429--438, 2010.
[25]
Z. Zhao, D. Yan, and W. Ng. Mining probabilistically frequent sequential patterns in uncertain databases. In EDBT, pages 74--85, 2012.

Cited By

View all
  • (2024)Discovering frequent parallel episodes in complex event sequences by counting distinct occurrencesApplied Intelligence10.1007/s10489-023-05187-y54:1(701-721)Online publication date: 1-Jan-2024
  • (2020)Efficient Mining of Outlying Sequence Patterns for Analyzing Outlierness of Sequence DataACM Transactions on Knowledge Discovery from Data10.1145/339967114:5(1-26)Online publication date: 5-Aug-2020
  • (2020)Multi-Source Data Stream Online Frequent Episode MiningIEEE Access10.1109/ACCESS.2020.29973378(107465-107478)Online publication date: 2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EDBT '13: Proceedings of the 16th International Conference on Extending Database Technology
March 2013
793 pages
ISBN:9781450315975
DOI:10.1145/2452376
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 March 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. frequent serial episodes
  2. uncertain sequences

Qualifiers

  • Research-article

Funding Sources

Conference

EDBT/ICDT '13

Acceptance Rates

Overall Acceptance Rate 7 of 10 submissions, 70%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Discovering frequent parallel episodes in complex event sequences by counting distinct occurrencesApplied Intelligence10.1007/s10489-023-05187-y54:1(701-721)Online publication date: 1-Jan-2024
  • (2020)Efficient Mining of Outlying Sequence Patterns for Analyzing Outlierness of Sequence DataACM Transactions on Knowledge Discovery from Data10.1145/339967114:5(1-26)Online publication date: 5-Aug-2020
  • (2020)Multi-Source Data Stream Online Frequent Episode MiningIEEE Access10.1109/ACCESS.2020.29973378(107465-107478)Online publication date: 2020
  • (2019)Mining Entropy Optimized Parameter based Precise positioning Episode Rules from Event Sequences2019 11th International Conference on Advanced Computing (ICoAC)10.1109/ICoAC48765.2019.247131(225-231)Online publication date: Dec-2019
  • (2018)Mining sequential patterns from probabilistic databasesKnowledge and Information Systems10.1007/s10115-014-0766-744:2(325-358)Online publication date: 30-Dec-2018
  • (2017)Applying episode mining and pruning to identify malicious online attacksComputers & Electrical Engineering10.1016/j.compeleceng.2015.08.01559(180-188)Online publication date: Apr-2017
  • (2017)Sequential pattern mining in databases with temporal uncertaintyKnowledge and Information Systems10.1007/s10115-016-0977-151:3(821-850)Online publication date: 1-Jun-2017
  • (2016)Two-Phase Mining for Frequent Closed EpisodesWeb-Age Information Management10.1007/978-3-319-39937-9_5(55-66)Online publication date: 28-May-2016
  • (2016)Distributed Sequential Pattern Mining in Large Scale Uncertain DatabasesProceedings, Part II, of the 20th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining - Volume 965210.1007/978-3-319-31750-2_2(17-29)Online publication date: 19-Apr-2016
  • (2015)Using Serial Episode Mining to Identify Internet AttacksApplied Mechanics and Materials10.4028/www.scientific.net/AMM.764-765.988764-765(988-991)Online publication date: May-2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media