Accelerated Sequential Data Clustering

Mortazavi, Reza; Enayati, Elham; Basiri, Abdolali

doi:10.1007/s00357-024-09472-4

Accelerated Sequential Data Clustering

Original Research
Published: 09 May 2024

Volume 41, pages 245–263, (2024)
Cite this article

Journal of Classification Aims and scope Submit manuscript

189 Accesses
Explore all metrics

Abstract

Data clustering is an important task in the field of data mining. In many real applications, clustering algorithms must consider the order of data, resulting in the problem of clustering sequential data. For instance, analyzing the moving pattern of an object and detecting community structure in a complex network are related to sequential data clustering. The constraint of the continuous region prevents previous clustering algorithms from being directly applied to the problem. A dynamic programming algorithm was proposed to address the issue, which returns the optimal sequential data clustering. However, it is not scalable and hence the practicality is limited. This paper revisits the solution and enhances it by introducing a greedy stopping condition. This condition halts the algorithm’s search process when it is likely that the optimal solution has been found. Experimental results on multiple datasets show that the algorithm is much faster than its original solution while the optimality gap is negligible.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Algorithm 1

Algorithm 2

Online clustering of streaming trajectories

Article 23 March 2018

A Novel Sequential Pattern Mining Algorithm for Large Scale Data Sequences

Incremental mining of temporal patterns in interval-based database

Article 27 February 2015

Data Availability

The Abdominal and Direct Fetal ECG Database analyzed during the current study is available in the https://physionet.org/content/adfecgdb/1.0.0/, the Brno University of Technology ECG Signal Database is available in the https://physionet.org/content/but-pdb/1.0.0/, the MIT-BIH arrhythmia database is available in the https://physionet.org/content/mitdb/1.0.0/, and the QT database is available in https://physionet.org/content/qtdb/1.0.0/ repository.

Notes

This can be achieved by subtracting $\text {Mean}(X)$ from each data point $x_i$ in X
Usage: pip install accelerated-sequence-clustering.
https://physionet.org/content/adfecgdb/1.0.0/
https://physionet.org/content/but-pdb/1.0.0/
https://physionet.org/content/mitdb/1.0.0/
https://physionet.org/content/qtdb/1.0.0/
https://www.math.uwaterloo.ca/tsp/world/grpoints.html

References

Abbasi, M., Bhaskara, A., & Venkatasubramanian, S. (2021). Fair clustering via equitable group representations. In: Proceedings of the ACM conference on fairness, accountability, and transparency (pp. 504–514)
Abbasimehr, H., & Baghery, F. S. (2022). A novel time series clustering method with fine-tuned support vector regression for customer behavior analysis. Expert Systems with Applications (p. 117584)
Aloise, D., Deshpande, A., Hansen, P., et al. (2009). NP-hardness of Euclidean sum-of-squares clustering. Machine Learning, 75(2), 245–248.
Article Google Scholar
Arthur, D. (2007). K-means++: The advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms (pp. 1027–1035). New Orleans, Louisiana, Society for Industrial and Applied Mathematics.
Bigdeli, A., Maghsoudi, A., & Ghezelbash, R. (2022). Application of self-organizing map (SOM) and K-means clustering algorithms for portraying geochemical anomaly patterns in Moalleman District, NE Iran. Journal of Geochemical Exploration, 233(106), 923.
Google Scholar
Cerqueti, R., D’Urso, P., De Giovanni, L., et al. (2022). Weighted score-driven fuzzy clustering of time series with a financial application. Expert Systems with Applications, 198(116), 752.
Google Scholar
Chan, Z. S., Collins, L., & Kasabov, N. (2006). An efficient greedy k-means algorithm for global gene trajectory clustering. Expert Systems with Applications, 30(1), 137–141.
Article Google Scholar
Ding, C., Sun, S., & Zhao, J. (2022). MST-GAT: A multimodal spatial-temporal graph attention network for time series anomaly detection. Information Fusion,.
Dogan, A., & Birant, D. (2022). K-centroid link: A novel hierarchical clustering linkage method. Applied Intelligence, 52(5), 5537–5560.
Article Google Scholar
Dupin, N., Nielsen, F., & Talbi, E. (2018). Dynamic programming heuristic for K-means clustering among a 2-dimensional Pareto frontier. In: 7th International conference on metaheuristics and nature inspired computing (pp. 1–8)
Enayati, E., Mortazavi, R., Basiri, A., et al. (2023). Time series anomaly detection via clustering-based representation. Evolving Systems. In press
Frey, B. J., & Dueck, D. (2007). Clustering by passing messages between data points. Science, 315(5814), 972–976.
Article MathSciNet Google Scholar
Houssein, E. H., Ibrahim, I. E., Neggaz, N., et al. (2021). An efficient ECG arrhythmia classification method based on Manta ray foraging optimization. Expert Systems with Applications, 181(115), 131.
Google Scholar
Jezewski, J., Matonia, A., Kupka, T., et al. (2012). Determination of fetal heart rate from abdominal signals: Evaluation of beat-to-beat accuracy in relation to the direct fetal electrocardiogram. Biomedizinische Technik/Biomedical Engineering, 57(5), 383–394.
Article Google Scholar
Kalti, K., & Touil, A. (2023). A robust contextual fuzzy C-means clustering algorithm for noisy image segmentation. Journal of Classification. In press
Kaya, M. F., & Schoop, M. (2022). Analytical comparison of clustering techniques for the recognition of communication patterns. Group Decision and Negotiation, 31(3), 555–589.
Article Google Scholar
Laguna, P., Mark, R. G., Goldberg, A., et al. (1997). A database for evaluation of algorithms for measurement of QT and other waveform intervals in the ECG. In: Computers in cardiology 1997 (pp. 673–676). IEEE
Lei, T., Jia, X., Zhang, Y., et al. (2018). Significantly fast and robust fuzzy C-means clustering algorithm based on morphological reconstruction and membership filtering. IEEE Transactions on Fuzzy Systems, 26(5), 3027–3041.
Article Google Scholar
Li, A., Xiong, S., Li, J., et al. (2022). AngClust: Angle feature-based clustering for short time series gene expression profiles. IEEE/ACM Transactions on Computational Biology and Bioinformatics,.
Li, H. (2019). Multivariate time series clustering based on common principal component analysis. Neurocomputing, 349, 239–247.
Article Google Scholar
Li, X., & Liu, H. (2018). Greedy optimization for K-means-based consensus clustering. Tsinghua Science and Technology, 23(2), 184–194.
Article Google Scholar
Li, Y., Ma, J., Miao, Y., et al. (2020). Similarity search for encrypted images in secure cloud computing. IEEE Transactions on Cloud Computing,.
Lin, C. R., & Chen, M. S. (2002). On the optimal clustering of sequential data. In: Proceedings of the 2002 SIAM international conference on data mining (pp. 141–157). SIAM
Maršánová, L., Smisek, R., Němcová, A., et al. (2021). Brno University of Technology ECG signal database with annotations of P wave (BUT PDB)
Moody, G. B., & Mark, R. G. (2001). The impact of the MIT-BIH arrhythmia database. IEEE Engineering in Medicine and Biology Magazine, 20(3), 45–50.
Article Google Scholar
Mortazavi, R., & Erfani, S. H. (2018). An effective method for utility preserving social network graph anonymization based on mathematical modeling. International Journal of Engineering, 31(10), 1624–1632.
Google Scholar
Mortazavi, R., & Jalili, S. (2014). Fast data-oriented microaggregation algorithm for large numerical datasets. Knowledge-Based Systems, 67, 195–205.
Article Google Scholar
Mortazavi, R., & Jalili, S. (2017). Fine granular proximity breach prevention during numerical data anonymization. Transactions on Data Privacy, 10(2), 117–144.
Google Scholar
Moshkovitz, M., Dasgupta, S., Rashtchian, C., et al. (2020). Explainable K-means and K-medians clustering. In: International Conference on Machine Learning (pp. 7055–7065). PMLR
Nielsen, F. (2016). Hierarchical clustering. In: Introduction to HPC with MPI for data science (pp. 195–211). Springer, chap 8
Pakhira, M. K. (2014). A linear time-complexity k-means algorithm using cluster shifting. In: International conference on computational intelligence and communication networks (pp. 1047–1051). IEEE
Pasupathi, S., Shanmuganathan, V., Madasamy, K., et al. (2021). Trend analysis using agglomerative hierarchical clustering approach for time series big data. The Journal of Supercomputing, 77(7), 6505–6524.
Article Google Scholar
Sun, L., Qin, X., Ding, W., et al. (2022). Nearest neighbors-based adaptive density peaks clustering with optimized allocation strategy. Neurocomputing, 473, 159–181.
Article Google Scholar
Suo, Y., Ji, Y., Zhang, Z., et al. (2022). A formal and visual data-mining model for complex ship behaviors and patterns. Sensors, 22(14), 5281.
Article Google Scholar
Wang, H., & Song, M. (2011). Ckmeans. 1d. dp: Optimal K-means clustering in one dimension by dynamic programming. The R journal, 3(2), 29.
Article Google Scholar
Wang, Q., Zhang, F., & Li, X. (2018). Optimal clustering framework for hyperspectral band selection. IEEE Transactions on Geoscience and Remote Sensing, 56(10), 5910–5922.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Engineering, Damghan University, 3671645667, Damghan, Iran
Reza Mortazavi
Department of Computer Science, Faculty of Basic Sciences, University of Bojnord, Bojnord, Iran
Elham Enayati
School of Mathematics and Computer Sciences, Damghan University, 3671645667, Damghan, Iran
Abdolali Basiri

Authors

Reza Mortazavi
View author publications
You can also search for this author in PubMed Google Scholar
Elham Enayati
View author publications
You can also search for this author in PubMed Google Scholar
Abdolali Basiri
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study’s conception and design. Material preparation, data collection, and analysis. The first draft of the manuscript was written by Reza Mortazavi, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Reza Mortazavi.

Ethics declarations

Ethical Conduct

Ethical conduct is not applicable for this article.

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Mortazavi, R., Enayati, E. & Basiri, A. Accelerated Sequential Data Clustering. J Classif 41, 245–263 (2024). https://doi.org/10.1007/s00357-024-09472-4

Download citation

Accepted: 22 April 2024
Published: 09 May 2024
Issue Date: July 2024
DOI: https://doi.org/10.1007/s00357-024-09472-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerated Sequential Data Clustering

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Online clustering of streaming trajectories

A Novel Sequential Pattern Mining Algorithm for Large Scale Data Sequences

Incremental mining of temporal patterns in interval-based database

Data Availability

Notes

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical Conduct

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Accelerated Sequential Data Clustering

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Online clustering of streaming trajectories

A Novel Sequential Pattern Mining Algorithm for Large Scale Data Sequences

Incremental mining of temporal patterns in interval-based database

Data Availability

Notes

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical Conduct

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation