An Enhanced Parameter-Free Subsequence Time Series Clustering for High-Variability-Width Data

Madicar, Navin; Sivaraks, Haemwaan; Rodpongpun, Sura; Ratanamahatana, Chotirat Ann

doi:10.1007/978-3-319-07692-8_40

Navin Madicar⁵,
Haemwaan Sivaraks⁵,
Sura Rodpongpun⁵ &
…
Chotirat Ann Ratanamahatana⁵

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 287))

1547 Accesses

Abstract

In time series mining, subsequence time series (STS) clustering has been widely used as a subroutine in various mining tasks, e.g., anomaly detection, classification, or rule discovery. STS clustering’s main objective is to cluster similar underlying subsequences together. Other than the known problem of meaninglessness in the STS clustering results, another challenge is on clustering where the subsequence patterns have variable lengths. General approaches provide a solution only to the problems where the range of width variability is small and under some predefined parameters, which turns out to be impractical for real-world data. Thus, we propose a new algorithm that can handle much larger variability in the pattern widths, while providing the parameter-free characteristic, so that the users would no longer suffer from the difficult task of parameter selection. The Minimum Description Length (MDL) principle and motif discovery technique are adopted to be used in determining the proper widths of the subsequences. The experimental results confirm that our proposed algorithm can effectively handle very large width variability of the time series subsequence patterns by outperforming all other recent STS clustering algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Keogh, E.J., Lin, J., Truppel, W.: Clustering of Time Series Subsequences is Meaningless: Implications for Previous and Future Research. In: Proceedings of the IEEE International Conference on Data Mining (ICDM), pp. 115–122 (2003)
Google Scholar
Das, G., Lin, K., Mannila, H., Renganathan, G., Smyth, P.: Rule Discovery from Time Series. In: Proceedings of the 3rd Knowledge Discovery and Data Mining (KDD) (1998)
Google Scholar
Zakaria, J., Mueen, A., Keogh, E.: Clustering Time Series Using Unsupervised-Shapelets. In: Proceedings of the IEEE International Conference on Data Mining (ICDM), pp. 785–794 (2012)
Google Scholar
Barron, A., Rissanen, J., Yu, B.: The minimum description length principle in coding and modeling. IEEE Transactions on Information Theory 44(6), 2743–2760 (1998)
Article MATH MathSciNet Google Scholar
Fu, T.: A review on time series data mining. Engineering Applications of Artificial Intelligence 24, 164–181 (2011)
Article Google Scholar
Rakthanmanon, T., Keogh, E.J., Lonardi, S., Evans, S.: Time Series Epenthesis: Clustering Time Series Streams Requires Ignoring. In: Proceedings of the 11th IEEE International Conference on Data Mining (ICDM), pp. 547–556 (2011)
Google Scholar
Mueen, A., Keogh, E.J., Zhu, Q., Cash, S., Westover, M.B.: Exact Discovery of Time Series Motifs. In: Proceedings of the SIAM International Conference on Data Mining, pp. 473–484 (2009)
Google Scholar
Rodpongpun, S., Niennattrakul, V., Ratanamahatana, C.A.: Selective Subsequence Time Series clustering. Knowledge-Based Systems 35, 361–368 (2012)
Article Google Scholar
Keogh, E.J., Xi, X., Wei, L., Ratanamahatana, C.A., The, U.C.R.: The UCR time series classification/clustering homepage (2008), www.cs.ucr.edu/~eamonn/time_series_dat/
Cotofrei, P., Stoffel, K.: Classification Rules + Time = Temporal Rules. In: Sloot, P.M.A., Tan, C.J.K., Dongarra, J., Hoekstra, A.G. (eds.) ICCS-ComputSci 2002, Part I. LNCS, vol. 2329, pp. 572–581. Springer, Heidelberg (2002)
Chapter Google Scholar
Yingchareonthawornchai, S., Sivaraks,Rodpongpun, S., Ratanamahatana, C.A.: The Proper Length Motif Discovery Algorithm. In: Proceedings of the 16th International Computer Science and Engineering Conference (ICSEC 2012), Chonburi, Thailand (2012)
Google Scholar
Madicar, N., Sivaraks, H., Rodpongpun, S., Ratanamahatana, C.A.: Parameter-free subsequences time series clustering with various-width clusters. In: 2013 5th International Conference on Knowledge and Smart Technology (KST), pp. 150–155 (2013)
Google Scholar
Niennattrakul, V., Wanichsan, D., Ratanamahatana, C.A.: Accurate Subsequence Matching on Data Stream under Time Warping Distance. In: Theeramunkong, T., Nattee, C., Adeodato, P.J.L., Chawla, N., Christen, P., Lenca, P., Poon, J., Williams, G. (eds.) New Frontiers in Applied Data Mining. LNCS, vol. 5669, pp. 156–167. Springer, Heidelberg (2010)
Chapter Google Scholar
Wang, S., Gan, W., Li, D., Li, D.: Data Field for Hierarchical Clustering. International Journal of Data Warehousing and Mining archive (IJDWM) 7(4), 43–63 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Engineering, Chulalongkorn University, 254 Phayathai Rd., Pathumwan, Bangkok, 10330, Thailand
Navin Madicar, Haemwaan Sivaraks, Sura Rodpongpun & Chotirat Ann Ratanamahatana

Authors

Navin Madicar
View author publications
You can also search for this author in PubMed Google Scholar
Haemwaan Sivaraks
View author publications
You can also search for this author in PubMed Google Scholar
Sura Rodpongpun
View author publications
You can also search for this author in PubMed Google Scholar
Chotirat Ann Ratanamahatana
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Navin Madicar .

Editor information

Editors and Affiliations

Department of Information System Faculty of Comp. Sci. & Info. Tech., University of Malaya, Kuala Lumpur, Malaysia
Tutut Herawan
Faculty of Comp. Sci. and Info. Tech, Universiti Tun Hussein Onn Malaysia, Parit Raja, Malaysia
Rozaida Ghazali
Faculty of Comp. Sci. and Info. Tech., Universiti Tun Hussein Onn Malaysia, Parit Raja, Malaysia
Mustafa Mat Deris

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Madicar, N., Sivaraks, H., Rodpongpun, S., Ratanamahatana, C.A. (2014). An Enhanced Parameter-Free Subsequence Time Series Clustering for High-Variability-Width Data. In: Herawan, T., Ghazali, R., Deris, M. (eds) Recent Advances on Soft Computing and Data Mining. Advances in Intelligent Systems and Computing, vol 287. Springer, Cham. https://doi.org/10.1007/978-3-319-07692-8_40

Download citation

DOI: https://doi.org/10.1007/978-3-319-07692-8_40
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07691-1
Online ISBN: 978-3-319-07692-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics