ABSTRACT
Timeseries can be similar in shape but differ in length. For example, the sound waves produced by the same word spoken twice have roughly the same shape, but one may be shorter in duration. Stream data mining, approximate querying of image and video databases, data compression, and near duplicate detection are applications that need to be able to classify or cluster such timeseries, and to search for and rank timeseries that are similar to a chosen timeseries. We demonstrate software for clustering and performing similarity search in databases of timeseries data, where the timeseries have high and variable dimensionality. Our demonstration uses Timeseries Sensitive Hashing (TSH)[3] to index the timeseries. TSH adapts Locality Sensitive Hashing (LSH), which is an approximate algorithm to index data points in a d-dimensional space under some (e.g., Euclidean) distance function. TSH, unlike LSH, can index points that do not have the same dimensionality. As examples of the potential of TSH, the demonstration will index and classify timeseries from an image database and timeseries describing human motion extracted from a video stream and a motion capture system.
- I. Assent, R. Krieger, F. Afschari, and T. Seidl. The TS-tree: efficient time series search and retrieval. In 11th Conference on Extending Database Technology (EDBT), pages 252--263, New York, NY, USA, 2008. Google ScholarDigital Library
- BerkleyDB. http://www.oracle.com/database/berkeleydb/, 2011.Google Scholar
- O. U. Florez, A. Ocsa, and C. E. Dyreson. Sublinear querying of realistic timeseries and its application to human motion. In Multimedia Information Retrieval, pages 137--146, 2010. Google ScholarDigital Library
- A. W.-c. Fu, E. Keogh, L. Y. H. Lau, and C. A. Ratanamahatana. Scaling and time warping in time series querying. In VLDB'05: Proceedings of the 31st international conference on Very Large Data Bases, pages 649--660. VLDB Endowment, 2005. Google ScholarDigital Library
- A. Guttman. R-trees: A dynamic index structure for spatial searching. In SIGMOD Conference, pages 47--57, 1984. Google ScholarDigital Library
- E. Keogh and C. A. Ratanamahatana. Exact indexing of dynamic time warping. Knowledge and Information Systems, 7(3):358--386, 2005. Google ScholarDigital Library
- H. Koga, T. Ishibashi, and T. Watanabe. Fast agglomerative hierarchical clustering algorithm using locality-sensitive hashing. Knowledge and Information Systems, 12:25--53, May 2007. Google ScholarDigital Library
- J. Shieh and E. Keogh. iSAX: indexing and mining terabyte sized time series. In KDD'08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 623--631, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- Y.-P. Wu, J.-J. Guo, and X.-J. Zhang. A linear DBScan algorithm based on LSH. 6th Conference on Machine Learning and Cybernetics,, August 2007.Google ScholarCross Ref
Index Terms
- Scalable similarity search of timeseries with variable dimensionality
Recommendations
Sublinear querying of realistic timeseries and its application to human motion
MIR '10: Proceedings of the international conference on Multimedia information retrievalThis paper introduces a novel hashing algorithm for large timeseries databases, which can improve the querying of human motion. Timeseries that represent human motion come from many sources, in particular, videos and motion capture systems. Motion-...
An Application of Similarity Search in Streaming Time Series under DTW: Online Forecasting
SoICT '17: Proceedings of the 8th International Symposium on Information and Communication TechnologyTime-series forecasting has had an incessant attraction to many researchers on time-series data mining. In the paper, we introduce an efficient online forecasting method based on similarity search in streaming time series under Dynamic Time Warping (DTW)...
An artificial neural network (p,d,q) model for timeseries forecasting
Artificial neural networks (ANNs) are flexible computing frameworks and universal approximators that can be applied to a wide range of time series forecasting problems with a high degree of accuracy. However, despite all advantages cited for artificial ...
Comments