Abstract
Mining sequences and patterns in time series data streams is fast becoming a common practice in today’s world. The rapid progress of data collection and web technologies yields tremendous growth of flowing data in various complex forms that need to be analyzed in real time. Traditional data mining methods that typically require the process data to be scanned repeatedly are not feasible for stream data applications. However, new techniques like SPRING attempt to address these challenges by identifying sequences of patterns on time series streams, thus reducing the complexity to be linear in both time and space. Unfortunately, SPRING does not support data normalization, which renders it to be not applicable for most data sets. In this paper, we are proposing an approach called NSPRING based on SPRING that extends the advantages of SPRING, e.g., low in time and space complexity, while it can support normalization. Furthermore, NSPRING retains similar mining accuracy to SPRING.
Similar content being viewed by others
References
Sakurai Y, Faloutsos C, Yamamuro M (2007) Stream monitoring under the time warping distance. In: Proceedings of the 23rd International Conference on Data Engineering, pp 1046–1055
Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 262–270
Keogh E, Kasetty S (2003) On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Min Knowl Discov 7(4):349–371
Rodpongpun S, Niennattrakul V, Ratanamahatana CA (2011) Efficient subsequence search on streaming data based on time warping distance. Comput Inf Technol 5(1):2
Ratanamahatana CA, Keogh E (2004) Everything you know about dynamic time warping is wrong. In: Third Workshop on Mining Temporal and Sequential Data, pp 22–25
Alon J, Athitsos V, Yuan Q, Sclaroff S (2009) A unified framework for gesture recognition and spatiotemporal gesture segmentation. IEEE Trans Pattern Anal Mach Intell 31(9):1685–1699
Ihm S-Y, Nasridinov A, Lee J-H, Park Y-H (2014) Efficient duality-based subsequent matching on time-series data in green computing. J Supercomput 69(3):1039–1053
Aach J, Church GM (2001) Aligning gene expression time series with time warping algorithms. Bioinformatics 17(6):495–508
Yi B-K, Jagadish H, Faloutsos C (1998) Efficient retrieval of similar time sequences under time warping. In: 14th International Conference on Data Engineering, pp 201–208
Itakura F (1975) Minimum prediction residual principle applied to speech recognition. IEEE Trans Acoust, Speech Signal Process 23(1):67–72
Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust, Speech Signal Process 26(1):43–49
Keogh E, Ratanamahatana CA (2005) Exact indexing of dynamic time warping. Knowl Inf Syst 7(3):358–386
Keogh E, Wei L, Xi X, Vlachos M, Lee S-H, Protopapas P (2009) Supporting exact indexing of arbitrarily rotated shapes and periodic time series under euclidean and warping distance measures. VLDB J 18(3):611–630
Sart D, Mueen A, Najjar W, Keogh E, Niennattrakul V (2010) Accelerating dynamic time warping subsequence search with gpus and fpgas. In: IEEE 10th International Conference on Data Mining (ICDM), pp 1001–1006
Wong TSF, Wong MH (2003) Efficient subsequence matching for sequences databases under time warping. In: Proceedings of Seventh International Database Engineering and Applications Symposium, pp 139–148
Peng Z, Liang S, Yan J, Hong HW, Qiang YS (2008) Fast similarity matching on data stream with noise. In: IEEE 24th International Conference on Data Engineering Workshop (ICDEW), pp 194–199
Zhou M, Wong MH (2008) Efficient online subsequence searching in data streams under dynamic time warping distance. In: IEEE 24th International Conference on Data Engineering (ICDE), pp 686–695
Niennattrakul V, Wanichsan D, Ratanamahatana CA (2010) Accurate subsequence matching on data stream under time warping distance. In: New Frontiers in Applied Data Mining, pp 156–167
Papapetrou P, Athitsos V, Potamias M, Kollios G, Gunopulos D (2011) Embedding-based subsequence matching in time-series databases. ACM Trans Database Syst (TODS) 36(3):17
Agrawal R, Faloutsos C, Swami AN (1993) Efficient similarity search in sequence databases. In: Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms, FODO ’93, pp 69–84
Chan K-P, Fu A-C (1999) Efficient time series matching by wavelets. In: Proceedings of the 15th International Conference on Data Engineering, pp 126–133
Keogh EJ, Pazzani MJ (2000) A simple dimensionality reduction technique for fast similarity search in large time series databases. In: Knowledge Discovery and Data Mining Current Issues and New Applications, pp 122–133
Patel P, Keogh E, Lin J, Lonardi S (2002) Mining motifs in massive time series databases. In: Proceedings of IEEE International Conference on Data Mining, pp 370–377
Shieh J, Keogh E (2008) \(i\)sax: indexing and mining terabyte sized time series. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 623–631
Chu KKW, Wong MH (1999) Fast time-series searching with scaling and shifting. In: Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp 237–248
Zhou M, Wong M-H, Chu K-W (2006) A geometrical solution to time series searching invariant to shifting and scaling. Knowl Inf Syst 9(2):202–229
Zhu Y, Shasha D (2003) Warping indexes with envelope transforms for query by humming. In: Proceedings of the 2003 ACM SIGMOD international conference on Management of data, pp 181–192
Keogh E, Zhu Q, Hu B, Hao Y, Xi X, Wei L, Ratanamahatana CA (2011) www.cs.ucr.edu/~eamonn/time_series_data/ The ucr time series classification/clustering homepage. www.cs.ucr.edu/~eamonn/time_series_data/
Acknowledgments
The authors are thankful for the financial support from the research grant “Temporal Data Stream Mining by Using Incrementally Optimized Very Fast Decision Forest (iOVFDF),” Grant no. MYRG2015-00128-FST, offered by the University of Macau, FST, and RDAO. We would also like to acknowledge the kind assistance of Ms. Katlin Kreamer-Tonin for proofreading this paper.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
See Table 5.
Rights and permissions
About this article
Cite this article
Gong, X., Fong, S., Chan, J.H. et al. NSPRING: the SPRING extension for subsequence matching of time series supporting normalization. J Supercomput 72, 3801–3825 (2016). https://doi.org/10.1007/s11227-015-1525-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-015-1525-6