Abstract
While a number of efficient sequential pattern mining algorithms were developed over the years, they can still take a long time and produce a huge number of patterns, many of which are redundant. These properties are especially frustrating when the goal of pattern mining is to find patterns for use as features in classification problems. In this paper, we describe BIDE-Discriminative, a modification of BIDE that uses class information for direct mining of predictive sequential patterns. We then perform an extensive evaluation on nine real-life datasets of the different ways in which the basic BIDE-Discriminative can be used in real multi-class classification problems, including 1-versus-rest and model-based search tree approaches. The results of our experiments show that 1-versus-rest provides an efficient solution with good classification performance.





Similar content being viewed by others
References
Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international ACM Press, conference on management of data, pp 207–216
Agrawal R, Srikant R (1995) Mining sequential patterns. In: ICDE. IEEE Press, pp 3–14
Asuncion A, Newman D (n.d.) UCI Machine Learning Repository
Batal I, Fradkin D, Harrison J, Moerchen F, Hauskrecht M (2012) Mining recent temporal patterns for event detection in multivariate time series data. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 280–288. doi:10.1145/2339530.2339578
Batal I, Valizadegan H, Cooper GF, Hauskrecht M (2011) A pattern mining approach for classifying multivariate temporal data. In: Proceedings of the 2011 IEEE international conference on bioinformatics and biomedicine, pp 358–365. doi:10.1109/BIBM.2011.39
Bringmann B, Zimmermann A (2008) One in a million: picking the right patterns. Knowl Inf Syst 18(1):61–81
Bringmann B, Zimmermann A, Raedt L, Nijssen S (2006) Dont be afraid of simpler patterns. In: Frnkranz J, Scheffer T, Spiliopoulou M (eds) Knowledge discovery in databases: PKDD 2006, vol 4213 of LNCS. Springer, Berlin, pp 55–66. doi:10.1007/11871637_10
Buza K, Schmidt-Thieme L (2010) Motif-based classification of time series with bayesian networks and svms. In: Fink A, Lausen B, Seidel W, Ultsch A (eds) Advances in data analysis, data handling and business intelligence. Studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp 105–114. doi:10.1007/978-3-642-01044-6_9
Carbonell J, Coldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of SIGIR, p 335336
Cheng H, Yan X, Han J, Hsu C-W (2007) Discriminative frequent pattern analysis for effective classification. In: Proceedings of the IEEE ICDE
Cheng H, Yan X, Han J, Yu PS (2008) Direct discriminative pattern mining for effective classification. In: ICDE, pp 169–178
Cover TM, Thomas JA (2006) Elements of information theory, 2nd edn. Wiley, New York
Dong G, Pei J (2007) Sequence data mining. Morgan Kaufmann, Burlington
Fan R-E, Chang K-W, Hsieh C-J, Wang X-R, Lin C-J (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9:1871–1874
Fan W, Zhang K, Cheng H, Gao J, Yan X, Han J, Yu PS, Verscheure O (2008) Direct mining of discriminative and essential frequent patterns via model-based search tree. In: KDD, pp 230–238
Fern A (2004) Learning models and formulas of a temporal event logic. PhD thesis, Purdue University, West Lafayette, IN, USA
Fradkin D, Moerchen F (2010) Margin-closed frequent sequential pattern mining. KDD workshop on useful patterns. ACM, New York, NY, USA, pp 45–54
Grahne G, Zhu J (2003) Efficiently using prefix-trees in mining frequent itemsets. In: ICDM workshop on frequent itemset mining implementations
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Han J, Kamber M (2006) Data mining: concepts and techniques, 2nd edn. Morgan Kaufmann, Burlington
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD international conference on management of data. ACM Press, pp 1–12
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: SIGMOD, pp 1–12
Ifrim G, Bakir GH, Weikum G (2008) Fast logistic regression for text categorization with variable-length n-grams. In: KDD, pp 354–362
Ifrim G, Wiuf C (2011) Bounded coordinate-descent for biological sequence classification in high dimensional predictor space. In: KDD
Kadous MW (2002) Temporal classification: extending the classification paradigm to multivariate time series. PhD thesis, University of New South Wales
Kerr W, Cohen P, Chang Y-H (2008) Learning and playing in wubble world. In: Proceedings of the fourth artificial intelligence and interactive digital entertainment conference, pp 66–71
Knobbe AJ, Ho EKY (2006) Pattern teams. In: PKDD, pp 577–584
Lee J-G, Han J, Li X, Cheng H (2011) Mining discriminative patterns for classifying trajectories on road networks. IEEE Trans Knowl Data Eng 23(5):713–726
Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 2003 ACM SIGMOD workshop on research issues in data mining and knowledge discovery. ACM Press, pp 2–11. URL:http://citeseer.ist.psu.edu/583097.html
Lo D, Cheng H, Cia L (2011) Mining closed discriminative dyadic sequential patterns. In: EDBT
Lo D, Han J, Cheng H, Khoo S-C, Sun C (2009) Classification of software behaviros for failure detection: a discriminative pattern mining approach. In: Proceedings of KDD
Lucchese C, Orlando S, Perego R (2006) Fast and memory efficient mining of frequent closed itemsets. IEEE Trans Knowl Data Eng 18(1):21–36
Mäntyjärvi J, Himberg J, Kangas P, Tuomela U, Huuskonen P (2004) Sensor signal data set for exploring context recognition of mobile devices. In: Proceedings of PERVASIVE. Springer, pp 18–23
Moerchen F, Thies M, Ultsch A (2011) Efficient mining of all margin-closed itemsets with applications in temporal knowledge discovery and classification by compression. Knowl Inf Syst 29:55–80. doi:10.1007/s10115-010-0329-5
Mörchen F, Ultsch A (2005) Optimizing time series discretization for knowledge discovery. In: Proceedings of the ACM SIGKDD. ACM Press, pp 660–665
Mörchen F, Ultsch A (2007) Efficient mining of understandable patterns from multivariate interval time series. Data Min Knowl Discov 15(2):181–215. doi:10.1007/s10618-007-0070-1
Morishita S, Sese J (2000) Traversing itemset lattice with statistical metric pruning. In: PODS, pp 226–236
Nijssen S, Kok J (2006) Multi-class correlated pattern mining. In: Bonchi F, Boulicaut J-F (eds) Knowledge discovery in inductive databases, vol 3933 of LNCS. Springer, Berlin, pp 165–187. doi:10.1007/11733492_10
Ohara K, Hara M, Takabayashi K, Motoda H, Washio T (2008) Pruning strategies based on the upper bound of information gain for discriminative subgraph mining. In: PKAW’08, pp 50–60
Papaterou P, Kollios G, Sclaroff S, Gunopoulos D (2005) Discovering frequent arrangements of temporal intervals. In: ICDM, pp 354–361
Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M-C (2001) PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of the IEEE ICDE. IEEE Press, pp 215–224
Sese J, Morishita S (2004) Itemset classified clustering, In: Boulicaut J-F, Esposito F, Giannotti F, Pedreschi D (eds) Knowledge discovery in databases: PKDD 2004, vol 3202 of LNCS. Springer, Berlin, pp 398–409. doi:10.1007/978-3-540-30116-5_37
Sipos R, Fradkin D, Moerchen F, Wang Z (2014) Log-based predictive maintenance, In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1867–1876. doi:10.1145/2623330.2623340
Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the 5th international conference on extending database technology (EDBT). Springer, pp 3–17. URL:http://citeseer.ist.psu.edu/article/srikant96mining.html
Starner T, Weaver J, Pentland A (1998) Real-time American sign language recognition using desk and wearable computer-based video. IEEE Trans Pattern Anal Mach Intell 20(12):1371–1375. doi:10.1109/34.735811
Wang J, Han J (2004) BIDE: Efficient mining of frequent closed sequences. In: ICDE. IEEE Press, pp 79–90
Wang J, Han J, Li C (2007) Frequent closed sequence mining without candidate maintenance. IEEE Trans Knowl Data Eng 19(8):1042–1056
Wu S-Y, Chen Y-L (2007) Mining nonambiguous temporal patterns for interval-based events. IEEE Trans Knowl Data Eng 19(6):742–758
Xu W, Huang L, Fox A, Patterson D, Jordan M (2008) Mining console logs for large-scale system problem detection. In: Proceedings of the 3rd workshop on tackling computer systems problems with machine learning techniques
Yan X, Han J (2002) gspan: Graph-based substructure pattern mining. In: ICDM
Yang Y, Pedersen J (1997) A comparative study on feature selection in text categorization. In: ICML, pp 412–420
Zaki M (2001) Spade: an efficient algorithm for mining frequent sequences. Mach Learn 42:31–60
Zaki MJ, Hsiao C-J (2002) CHARM: an efficient algorithm for closed itemset mining. In: Proceedings of the 2nd SIAM international conference on data mining (SDM), SIAM, pp 457–473
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Fradkin, D., Mörchen, F. Mining sequential patterns for classification. Knowl Inf Syst 45, 731–749 (2015). https://doi.org/10.1007/s10115-014-0817-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-014-0817-0