Generalized k-means-based clustering for temporal data under weighted and kernel time warp☆
Section snippets
Introduction and related work
Temporal data naturally arise in various emerging applications, such as sensor networks, human mobility or internet of things. Clustering is an important task, usually applied priori to any pattern analysis tasks, for summarization, cluster and prototype extraction, and is crucial for big data dimensionality reduction.
k-means-based clustering, viz. standard k-means, k-means++, fuzzy c-means, and all its variations, is among the most popular clustering algorithms, because it provides a good
Generalized k-means for temporal data clustering
The k-means algorithm aims at providing a partition of a set of data points in distinct clusters such that the inertia within each cluster is minimized, the inertia being defined as the sum of distances between any data point in the cluster and the centroid (or representative) of the cluster. The k-means algorithm was originally developed with the Euclidean distance, the representative of each cluster being defined as the center of gravity of the cluster. This algorithm can, however, be
Centroid estimation for time warp measures
We first describe here the general strategy followed to estimate the representatives (or centroids) of a cluster of data points (X), prior to studying the solution this strategy leads to for the three extended measures introduced above.
Experiments
In this section, we first describe the datasets retained to conduct our experiments prior to comparing the generalized k-means algorithms, based on the extended wdtw(Eq. (4)) and (Eq. (5)) and the centroid estimations given in Section 3, to two alternative approaches i) k-medoids with the standard unweighted and ii) kernel k-means with the standard unweighted and temporal kernels.
Conclusion
This work introduces a generalized centroid-based clustering algorithm for temporal data under time warp measures. For this, we propose i) an extension of the common time warp measures and ii) a tractable, fast and efficient estimation of the cluster representatives, under the extended time warp measures, that captures local temporal features. The efficiency of this algorithm is analyzed on a wide range of challenging datasets, which are non-isotropic (i.e., non-spherical), not well-isolated
References (27)
- et al.
T-coffee: a novel method for fast and accurate multiple sequence alignment
J. Mol. Biol.
(2000) - et al.
T-coffee: a novel method for fast and accurate multiple sequence alignment
J. Mol. Biol.
(2000) - et al.
A global averaging method for dynamic time warping, with applications to clustering
Pattern Recognit.
(2011) - et al.
K-means++: the advantages of careful seeding
Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07
(2007) - et al.
Online handwriting recognition with support vector machines-a kernel approach
Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition, 2002
(2002) Fast global alignment kernels
Proceedings of the 28th International Conference on Machine Learning (ICML-11)
(2011)- et al.
A kernel for time series based on global alignments
Proceedings of the International Conference on Acoustics, Speech and Signal Processing
(2007) - et al.
The symmetric time warping algorithm: from continuous to discrete
Time Warps, String Edits and Macromolecules
(1983) - et al.
Dynamic time-alignment kernel in support vector machine
Proceedings of Neural Information Processing Systems, NIPS
(2002) - et al.
Clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice
Nucleic Acids Res.
(1994)
Cross-words reference template for dtw-based speech recognition systems
Proceedings of Conference on Convergent Technologies for the Asia-Pacific Region, TENCON 2003
Kernel k-means: spectral clustering and normalized cuts
Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Mercer kernel-based clustering in feature space
IEEE Trans. Neural Netw.
Cited by (55)
K-sets and k-swaps algorithms for clustering sets
2023, Pattern RecognitionRectified Euler k-means and beyond
2023, Pattern RecognitionMeasuring the spatiotemporal evolution of accident hot spots
2021, Accident Analysis and PreventionCitation Excerpt :The output of these types of algorithms is an assigned group for each of the input features. K-means and DBSCAN have been used to examine both the spatial and temporal dimensions of accident clustering (Soheily-Khah et al., 2016). For example, the DBSCAN method has been extended to incorporate a temporal neighborhood in addition to a spatial neighborhood (e.g., ST-DBSCAN (Birant and Kut, 2007)).
Generalized k-means in GLMs with applications to the outbreak of COVID-19 in the United States
2021, Computational Statistics and Data AnalysisA flight maneuver recognition method based on multi-strategy affine canonical time warping
2020, Applied Soft Computing JournalInterpretable time series kernel analytics by pre-image estimation
2020, Artificial Intelligence
- ☆
This paper has been recommended for acceptance by G. Moser.