Abstract
Data clustering is an important task in the field of data mining. In many real applications, clustering algorithms must consider the order of data, resulting in the problem of clustering sequential data. For instance, analyzing the moving pattern of an object and detecting community structure in a complex network are related to sequential data clustering. The constraint of the continuous region prevents previous clustering algorithms from being directly applied to the problem. A dynamic programming algorithm was proposed to address the issue, which returns the optimal sequential data clustering. However, it is not scalable and hence the practicality is limited. This paper revisits the solution and enhances it by introducing a greedy stopping condition. This condition halts the algorithm’s search process when it is likely that the optimal solution has been found. Experimental results on multiple datasets show that the algorithm is much faster than its original solution while the optimality gap is negligible.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00357-024-09472-4/MediaObjects/357_2024_9472_Figa_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00357-024-09472-4/MediaObjects/357_2024_9472_Figb_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00357-024-09472-4/MediaObjects/357_2024_9472_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00357-024-09472-4/MediaObjects/357_2024_9472_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00357-024-09472-4/MediaObjects/357_2024_9472_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00357-024-09472-4/MediaObjects/357_2024_9472_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00357-024-09472-4/MediaObjects/357_2024_9472_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00357-024-09472-4/MediaObjects/357_2024_9472_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00357-024-09472-4/MediaObjects/357_2024_9472_Fig7_HTML.png)
Similar content being viewed by others
Data Availability
The Abdominal and Direct Fetal ECG Database analyzed during the current study is available in the https://physionet.org/content/adfecgdb/1.0.0/, the Brno University of Technology ECG Signal Database is available in the https://physionet.org/content/but-pdb/1.0.0/, the MIT-BIH arrhythmia database is available in the https://physionet.org/content/mitdb/1.0.0/, and the QT database is available in https://physionet.org/content/qtdb/1.0.0/ repository.
Notes
This can be achieved by subtracting \(\text {Mean}(X)\) from each data point \(x_i\) in X
Usage: pip install accelerated-sequence-clustering.
References
Abbasi, M., Bhaskara, A., & Venkatasubramanian, S. (2021). Fair clustering via equitable group representations. In: Proceedings of the ACM conference on fairness, accountability, and transparency (pp. 504–514)
Abbasimehr, H., & Baghery, F. S. (2022). A novel time series clustering method with fine-tuned support vector regression for customer behavior analysis. Expert Systems with Applications (p. 117584)
Aloise, D., Deshpande, A., Hansen, P., et al. (2009). NP-hardness of Euclidean sum-of-squares clustering. Machine Learning, 75(2), 245–248.
Arthur, D. (2007). K-means++: The advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms (pp. 1027–1035). New Orleans, Louisiana, Society for Industrial and Applied Mathematics.
Bigdeli, A., Maghsoudi, A., & Ghezelbash, R. (2022). Application of self-organizing map (SOM) and K-means clustering algorithms for portraying geochemical anomaly patterns in Moalleman District, NE Iran. Journal of Geochemical Exploration, 233(106), 923.
Cerqueti, R., D’Urso, P., De Giovanni, L., et al. (2022). Weighted score-driven fuzzy clustering of time series with a financial application. Expert Systems with Applications, 198(116), 752.
Chan, Z. S., Collins, L., & Kasabov, N. (2006). An efficient greedy k-means algorithm for global gene trajectory clustering. Expert Systems with Applications, 30(1), 137–141.
Ding, C., Sun, S., & Zhao, J. (2022). MST-GAT: A multimodal spatial-temporal graph attention network for time series anomaly detection. Information Fusion,.
Dogan, A., & Birant, D. (2022). K-centroid link: A novel hierarchical clustering linkage method. Applied Intelligence, 52(5), 5537–5560.
Dupin, N., Nielsen, F., & Talbi, E. (2018). Dynamic programming heuristic for K-means clustering among a 2-dimensional Pareto frontier. In: 7th International conference on metaheuristics and nature inspired computing (pp. 1–8)
Enayati, E., Mortazavi, R., Basiri, A., et al. (2023). Time series anomaly detection via clustering-based representation. Evolving Systems. In press
Frey, B. J., & Dueck, D. (2007). Clustering by passing messages between data points. Science, 315(5814), 972–976.
Houssein, E. H., Ibrahim, I. E., Neggaz, N., et al. (2021). An efficient ECG arrhythmia classification method based on Manta ray foraging optimization. Expert Systems with Applications, 181(115), 131.
Jezewski, J., Matonia, A., Kupka, T., et al. (2012). Determination of fetal heart rate from abdominal signals: Evaluation of beat-to-beat accuracy in relation to the direct fetal electrocardiogram. Biomedizinische Technik/Biomedical Engineering, 57(5), 383–394.
Kalti, K., & Touil, A. (2023). A robust contextual fuzzy C-means clustering algorithm for noisy image segmentation. Journal of Classification. In press
Kaya, M. F., & Schoop, M. (2022). Analytical comparison of clustering techniques for the recognition of communication patterns. Group Decision and Negotiation, 31(3), 555–589.
Laguna, P., Mark, R. G., Goldberg, A., et al. (1997). A database for evaluation of algorithms for measurement of QT and other waveform intervals in the ECG. In: Computers in cardiology 1997 (pp. 673–676). IEEE
Lei, T., Jia, X., Zhang, Y., et al. (2018). Significantly fast and robust fuzzy C-means clustering algorithm based on morphological reconstruction and membership filtering. IEEE Transactions on Fuzzy Systems, 26(5), 3027–3041.
Li, A., Xiong, S., Li, J., et al. (2022). AngClust: Angle feature-based clustering for short time series gene expression profiles. IEEE/ACM Transactions on Computational Biology and Bioinformatics,.
Li, H. (2019). Multivariate time series clustering based on common principal component analysis. Neurocomputing, 349, 239–247.
Li, X., & Liu, H. (2018). Greedy optimization for K-means-based consensus clustering. Tsinghua Science and Technology, 23(2), 184–194.
Li, Y., Ma, J., Miao, Y., et al. (2020). Similarity search for encrypted images in secure cloud computing. IEEE Transactions on Cloud Computing,.
Lin, C. R., & Chen, M. S. (2002). On the optimal clustering of sequential data. In: Proceedings of the 2002 SIAM international conference on data mining (pp. 141–157). SIAM
Maršánová, L., Smisek, R., Němcová, A., et al. (2021). Brno University of Technology ECG signal database with annotations of P wave (BUT PDB)
Moody, G. B., & Mark, R. G. (2001). The impact of the MIT-BIH arrhythmia database. IEEE Engineering in Medicine and Biology Magazine, 20(3), 45–50.
Mortazavi, R., & Erfani, S. H. (2018). An effective method for utility preserving social network graph anonymization based on mathematical modeling. International Journal of Engineering, 31(10), 1624–1632.
Mortazavi, R., & Jalili, S. (2014). Fast data-oriented microaggregation algorithm for large numerical datasets. Knowledge-Based Systems, 67, 195–205.
Mortazavi, R., & Jalili, S. (2017). Fine granular proximity breach prevention during numerical data anonymization. Transactions on Data Privacy, 10(2), 117–144.
Moshkovitz, M., Dasgupta, S., Rashtchian, C., et al. (2020). Explainable K-means and K-medians clustering. In: International Conference on Machine Learning (pp. 7055–7065). PMLR
Nielsen, F. (2016). Hierarchical clustering. In: Introduction to HPC with MPI for data science (pp. 195–211). Springer, chap 8
Pakhira, M. K. (2014). A linear time-complexity k-means algorithm using cluster shifting. In: International conference on computational intelligence and communication networks (pp. 1047–1051). IEEE
Pasupathi, S., Shanmuganathan, V., Madasamy, K., et al. (2021). Trend analysis using agglomerative hierarchical clustering approach for time series big data. The Journal of Supercomputing, 77(7), 6505–6524.
Sun, L., Qin, X., Ding, W., et al. (2022). Nearest neighbors-based adaptive density peaks clustering with optimized allocation strategy. Neurocomputing, 473, 159–181.
Suo, Y., Ji, Y., Zhang, Z., et al. (2022). A formal and visual data-mining model for complex ship behaviors and patterns. Sensors, 22(14), 5281.
Wang, H., & Song, M. (2011). Ckmeans. 1d. dp: Optimal K-means clustering in one dimension by dynamic programming. The R journal, 3(2), 29.
Wang, Q., Zhang, F., & Li, X. (2018). Optimal clustering framework for hyperspectral band selection. IEEE Transactions on Geoscience and Remote Sensing, 56(10), 5910–5922.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study’s conception and design. Material preparation, data collection, and analysis. The first draft of the manuscript was written by Reza Mortazavi, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethical Conduct
Ethical conduct is not applicable for this article.
Conflict of Interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mortazavi, R., Enayati, E. & Basiri, A. Accelerated Sequential Data Clustering. J Classif 41, 245–263 (2024). https://doi.org/10.1007/s00357-024-09472-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-024-09472-4