Abstract
Finding periodical regularities in sequential databases is an important topic in Knowledge Discovery. In pattern mining such regularity is modeled as partially periodic patterns, where typical periods (e.g., daily or weekly) can be considered. Although efficient algorithms have been studied, applying them to real databases is still challenging because they are noisy and most transactions are not extremely frequent in practice. They cause a combinatorial explosion of patterns and the difficulty of tuning a threshold parameter. To overcome these issues we investigate a pre-processing method called skeletonization, which was recently introduced for finding sequential patterns. It tries to find clusters of symbols in patterns, aiming at shrinking the space of all possible patterns in order to avoid the combinatorial explosion and to provide comprehensive patterns. The key idea is to compute similarities within symbols in patterns from a given database based on the definition of patterns we would like to mine, and to use clustering methods based on the similarities computed. Although the original method cannot allow for periods, we generalize it by using the periodicity. We give experimental results using both synthetic and real datasets, and compare results of mining with and without the skeletonization, to see that our method helps us to obtain comprehensive partially periodic patterns.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Consider to find all partially periodic patterns up to the length k on \(\varSigma \). Let \(\varSigma _\star =\varSigma \cup \{\star \}\). All possible combinations are in \(\varSigma _\star \cup \varSigma _\star ^2\cup \cdots \cup \varSigma _\star ^k\), which can become much larger than that of all patterns appearing in databases in practice.
- 2.
For example, if the range of values [0, 10) and \(|\varSigma |=4\), values in [0, 10] would be categorized into either [0, 2.5), [2.5, 5.0), [5.0, 7.5), or [7.5, 10), and symbolic alphabets are assigned into those bins to encode the sequence into a symbolic sequence.
- 3.
- 4.
Of course most of them are infrequent patterns.
- 5.
A similarity graph is a weighted graph in which vertices represent data points and edges represent the similarity between two points with their weights.
- 6.
The function is defined as \(\mathrm {Rect}_{i,r}(t) = 0\) if \(|t-i|> r\), 1 otherwise.
- 7.
- 8.
gcc 4.7 with -std=c++11 without any parallelization techniques.
References
Alzate, C., Suykens, J.A.: Hierarchical kernel spectral clustering. Neural Netw. 35, 21–30 (2012)
Celma, O.: Music Recommendation and Discovery in the Long Tail. Springer, Heidelberg (2010)
Cichocki, A., Zdunek, R., Amari, S.I.: Nonnegative matrix and tensor factorization [lecture notes]. IEEE Sig. Process. Mag. 25(1), 142–145 (2008)
Han, J., Dong, G., Yin, Y.: Efficient mining of partial periodic patterns in time series database. In: Proceedings of 15th ICDE, pp. 106–115 (1999)
Han, J., Gong, W., Yin, Y.: Mining segment-wise periodic patterns in time-related databases. In: Proceedings of 4th KDD, pp. 214–218 (1998)
Liu, C., Zhang, K., Xiong, H., Jiang, G., Yang, Q.: Temporal skeletonization on sequential data: patterns, categorization, and visualization. In: Proceedings of 20th KDD, pp. 1336–1345 (2014)
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. Adv. Neural Inf. Process. Syst. 13, 849–856 (2001)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., Dayal, U., Hsu, M.C.: Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Trans. Knowl. Data Eng. 16(11), 1424–1440 (2004)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intel. 22, 888–905 (1997)
Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)
Yang, K.J., Hong, T.P., Chen, Y.M., Lan, G.C.: Projection-based partial periodic pattern mining for event sequences. Exp. Syst. Appl. 40(10), 4232–4240 (2013)
Acknowledgments
The authors would like to thank anonymous reviewers for their valuable comments. This study was partially supported by Grant-in-Aid for JSPS Fellows (26-4555) and JSPS KAKENHI Grant Number 26280085.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Otaki, K., Yamamoto, A. (2015). Periodical Skeletonization for Partially Periodic Pattern Mining. In: Japkowicz, N., Matwin, S. (eds) Discovery Science. DS 2015. Lecture Notes in Computer Science(), vol 9356. Springer, Cham. https://doi.org/10.1007/978-3-319-24282-8_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-24282-8_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24281-1
Online ISBN: 978-3-319-24282-8
eBook Packages: Computer ScienceComputer Science (R0)