Comparison of different weighting schemes for the kNN classifier on time-series data

Geler, Zoltan; Kurbalija, Vladimir; Radovanović, Miloš; Ivanović, Mirjana

doi:10.1007/s10115-015-0881-0

Comparison of different weighting schemes for the kNN classifier on time-series data

Regular Paper
Published: 25 September 2015

Volume 48, pages 331–378, (2016)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Zoltan Geler¹,
Vladimir Kurbalija²,
Miloš Radovanović² &
…
Mirjana Ivanović²

1182 Accesses
29 Citations
Explore all metrics

Abstract

Many well-known machine learning algorithms have been applied to the task of time-series classification, including decision trees, neural networks, support vector machines and others. However, it was shown that the simple 1-nearest neighbor (1NN) classifier, coupled with an elastic distance measure like Dynamic Time Warping (DTW), often produces better results than more complex classifiers on time-series data, including k-nearest neighbor (kNN) for values of \(k>1\). In this article, we revisit the kNN classifier on time-series data by considering ten classic distance-based vote weighting schemes in the context of Euclidean distance, as well as four commonly used elastic distance measures: DTW, Longest Common Subsequence, Edit Distance with Real Penalty and Edit Distance on Real sequence. Through experiments on the complete collection of UCR time-series datasets, we confirm the view that the 1NN classifier is very hard to beat. Overall, for all considered distance measures, we found that variants of the Dudani weighting scheme produced the best results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Time series classification with ensembles of elastic distance measures

Article 28 June 2014

Impact of the Sakoe-Chiba Band on the DTW Time Series Distance Measure for kNN Classification

Using dynamic time warping distances as features for improved time series classification

Article 07 May 2015

References

Agrawal R, Faloutsos C, Swami A (1993) Efficient similarity search in sequence databases. In: Lomet David B (ed) Proceedings of the 4th international conference on foundations of data organization and algorithms (FODO’93). Springer, Berlin Heidelberg, pp 69–84
Bache K, Lichman M (2013) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA
Berndt D, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: Usama M, Fayyad RU (ed) Knowledge discovery in databases: papers from the 1994 AAAI workshop. AAAI Press, Seattle, Washington, pp 359–370
Bouckaert RR, Frank E (2004) Evaluating the replicability of significance tests for comparing learning algorithms. In: Dai H, Srikant R, Zhang C (eds) Advances in knowledge discovery and data mining. Springer, Berlin, Heidelberg, pp 3–12
Chapter Google Scholar
Brockwell PJ, Davis RA (2002) Introduction to time series and forecasting. Springer, New York
Book MATH Google Scholar
Chen L, Ng R (2004) On the marriage of lp-norms and edit distance. In: Nascimento MA, Özsu MT, Kossmann D, et al. (eds) Proceedings of the thirtieth international conference on very large data bases, Toronto, Canada, August 31–September 3, 2004. Morgan Kaufmann, pp 792–803
Chen L, Özsu MT, Oria V (2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of the 2005 ACM SIGMOD international conference on management of data—SIGMOD’05. ACM Press, New York, New York, USA, pp 491–502
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13:21–27
Article MATH Google Scholar
Das G, Gunopulos D (2003) Time series similarity and indexing. In: Ye N (ed) The handbook of data mining. Lawrence Erlbaum Associates, Mahwah, pp 279–304
Google Scholar
Ding H, Trajcevski G, Scheuermann P et al (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. In: Proceedings of the VLDB endowment, vol 1. pp 1542–1552
Dudani SA (1976) The distance-weighted k-nearest-neighbor rule. IEEE Trans Syst Man Cybern SMC 6:325–327
Article Google Scholar
Esling P, Agon C (2012) Time-series data mining. ACM Comput Surv 45:12:1–12:34
Article MATH Google Scholar
Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. In: Proceedings of ACM SIGMOD record, vol 23. pp 419–429
Fix E, Hodges JL (1989) Discriminatory analysis. Nonparametric discrimination: consistency properties. Int Stat Rev 57:238–247
Article MATH Google Scholar
García S, Fernández A, Luengo J, Herrera F (2009) A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability. Soft Comput 13:959–977
Article Google Scholar
García S, Herrera F (2008) An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9:2677–2694
MATH Google Scholar
Goldin DQ, Kanellakis PC (1995) On similarity queries for time-series data: Constraint specification and implementation. In: Montanari U, Rossi F (eds) Proceedings of principles and practice of constraint programming—CP’95. Springer, Berlin Heidelberg, pp 137–153
Górecki T, Łuczak M (2013) Using derivatives in time series classification. Data Min Knowl Discov 26:310–331
Article MathSciNet Google Scholar
Gou J, Du L, Zhang Y, Xiong T (2012) A new distance-weighted k-nearest neighbor classifier. J Inf Comput Sci 9:1429–1436
Google Scholar
Gou J, Xiong T, Kuang Y (2011) A novel weighted voting for k-nearest neighbor rule. J Comput 6:833–840
Article Google Scholar
Han J, Kamber M (2006) Data mining: concepts and techniques, 2nd edn. Morgan Kaufmann Publishers Inc., San Francisco
MATH Google Scholar
Hand DJ, Mannila H, Smyth P (2001) Principles of data mining. MIT Press, Cambridge
Google Scholar
Itakura F (1975) Minimum prediction residual principle applied to speech recognition. IEEE Trans Acoust Speech Signal Process 23:67–72
Article Google Scholar
Jeong Y-S, Jeong MK, Omitaomu OA (2011) Weighted dynamic time warping for time series classification. Pattern Recogn 44:2231–2240
Article Google Scholar
Keogh E, Ratanamahatana CA (2005) Exact indexing of dynamic time warping. Knowl Inf Syst 7:358–386
Article Google Scholar
Keogh E, Zhu Q, Hu B, et al (2011) The UCR time series classification/clustering homepage. www.cs.ucr.edu/~eamonn/time_series_data/
Kotsiantis SB (2007) Supervised machine learning: a review of classification techniques. In: Proceedings of the 2007 conference on emerging artificial intelligence applications in computer engineering: real word AI systems with applications in eHealth, HCI. Information retrieval and pervasive technologies. IOS Press, Amsterdam, pp 3–24
Kurbalija V, Ivanović M, Budimac Z (2009) Case-based curve behaviour prediction. Softw Pract Exp 39:81–103
Article Google Scholar
Kurbalija V, Ivanović M, von Bernstorff C et al (2014a) Matching observed with empirical reality—what you see is what you get? Fundam Inform 129:133–147
MathSciNet Google Scholar
Kurbalija V, Radovanović M, Geler Z, Ivanović M (2010) A framework for time-series analysis. In: Dicheva D, Dochev D (eds) Artificial intelligence: methodology, systems, and applications SE-5. Springer, Berlin, Heidelberg, pp 42–51
Chapter Google Scholar
Kurbalija V, Radovanović M, Geler Z, Ivanović M (2011) The Influence of Global Constraints on DTW and LCS Similarity Measures for Time-Series Databases. In: Dicheva D, Markov Z, Stefanova E (eds) Third international conference on software, services and semantic technologies S3T 2011 SE-10. Springer, Berlin, Heidelberg, pp 67–74
Kurbalija V, Radovanović M, Geler Z, Ivanović M (2014b) The influence of global constraints on similarity measures for time-series databases. Knowl Based Syst 56:49–67
Article Google Scholar
Kurbalija V, Radovanović M, Ivanović M et al (2014c) Time-series analysis in the medical domain: a study of Tacrolimus administration and influence on kidney graft function. Comput Biol Med 50:19–31
Article Google Scholar
Kurbalija V, von Bernstorff C, Burkhard H-D et al (2012) Time-series mining in a psychological domain. In: Proceedings of the fifth Balkan conference in informatics on—BCI ’12. ACM Press, New York, New York, USA, pp 58–63
Larose DT (2005) Discovering knowledge in data. Wiley, Hoboken
MATH Google Scholar
Laxman S, Sastry PS (2006) A survey of temporal data mining. Sadhana 31:173–198
Article MathSciNet MATH Google Scholar
Macleod J, Luk A, Titterington D (1987) A re-examination of the distance-weighted k-nearest neighbor classification rule. IEEE Trans Syst Man Cybern 17:689–696
Article Google Scholar
Marteau P-F (2009) Time warp edit distance with stiffness adjustment for time series matching. IEEE Trans Pattern Anal Mach Intell 31:306–318
Article Google Scholar
Mitchell TM (1997) Mach Learn. McGraw-Hill Inc, New York
Google Scholar
Mitrović D, Geler Z, Ivanović M (2012) Distributed distance matrix generator based on agents. In: Proceedings of the 2nd international conference on web intelligence, mining and semantics—WIMS’12. ACM Press, New York, New York, USA, pp 40:1–40:6
Mitrovic D, Ivanović M, Geler Z (2014) Agent-based distributed computing for dynamic networks. Inf Technol Control 43:88–97
Google Scholar
Morse MD, Patel JM (2007) An efficient and accurate method for evaluating time series similarity. In: Proceedings of the 2007 ACM SIGMOD international conference on management of data—SIGMOD’07. ACM Press, New York, New York, USA, pp 569–580
Nanopoulos A, Alcock R, Manolopoulos Y (2001) Feature-based classification of time-series data. Int J Comput Res 10:49–61
Google Scholar
Pao T-L, Chen Y-T, Yeh J-H et al (2007) A comparative study of different weighting schemes on KNN-based emotion recognition in Mandarin speech. In: Huang D-S, Heutte L, Loog M (eds) Advanced intelligent computing theories and applications. With aspects of theoretical and methodological issues. Springer, Berlin, Heidelberg, pp 997–1005
Chapter Google Scholar
Pao T-L, Chen Y-T, Yeh J-H, Chang Y-H (2005) Emotion recognition and evaluation of Mandarin speech using weighted D-KNN classification. In: Proceedings of the 17th conference on computational linguistics and speech processing, ROCLING 2005, Taiwan, ROC, 2005. Association for Computational Linguistics and Chinese Language Processing (ACLCLP), Taiwan, pp 203–212
Pavlovic V, Frey BJ, Huang TS (1999) Time-series classification using mixed-state dynamic Bayesian networks. In: Proceedings of 1999 IEEE computer society conference on computer vision and pattern recognition (Cat. No PR00149). IEEE Computer Society, pp 609–615
Radovanović M, Nanopoulos A, Ivanović M (2010) Time-series classification in many intrinsic dimensions. In: Proceedings of the 2010 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, Philadelphia, PA, pp 677–688
Radovanović M, Nanopoulos A, Ivanović M (2010b) Hubs in space: popular nearest neighbors in high-dimensional data. J Mach Learn Res 11:2487–2531
MathSciNet MATH Google Scholar
Ralanamahatana CA, Lin J, Gunopulos D et al (2005) Mining time series data. In: Maimon O, Rokach L (eds) Data mining and knowledge discovery handbook. Springer, New York, pp 1069–1103
Chapter Google Scholar
Ratanamahatana CA, Keogh E (2005) Three myths about dynamic time warping data mining. In: Proceedings of the 2005 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, Philadelphia, PA, pp 506–510
Rodríguez JJ, Alonso CJ (2004) Interval and dynamic time warping-based decision trees. In: Proceedings of the 2004 ACM symposium on applied computing. ACM, New York, NY, USA, pp 548–552
Rodríguez JJ, Alonso CJ, Boström H (2000) Learning first order logic time series classifiers: rules and boosting. In: Zighed D, Komorowski J, Żytkow J (eds) Principles of data mining and knowledge discovery SE-29. Springer, Berlin, Heidelberg, pp 299–308
Chapter Google Scholar
Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26:43–49
Article MATH Google Scholar
Shi T, Wang P, Wang J-S, Yue S (2012) Application of grid-based k-means clustering algorithm for optimal image processing. Comput Sci Inf Syst 9:1679–1696
Article Google Scholar
Skopal T, Bustos B (2011) On nonmetric similarity search problems in complex domains. ACM Comput Surv 43:34:1–34:50
Article MATH Google Scholar
Stojanović R, Knežević S, Karadaglić D, Devedžić G (2013) Optimization and implementation of the wavelet based algorithms for embedded biomedical signal processing. Comput Sci Inf Syst 10:503–523
Article Google Scholar
Takigawa Y, Hott S, Kiyasu S, Miyahara S (2005) Pattern classification using weighted average patterns of categorical k-nearest neighbors. In: Proceedings of the 1th international workshop on camera-based document analysis and recognition. pp 111–118
Tomašev N, Mladenić D (2012) Nearest neighbor voting in high dimensional data: learning from past occurrences. Comput Sci Inf Syst 9:691–712
Article Google Scholar
Vlachos M, Kollios G, Gunopulos D (2002) Discovering similar multidimensional trajectories. In: Proceedings 18th international conference on data engineering. IEEE Computer Society, pp 673–684
Wu X, Kumar V, Ross Quinlan J et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14:1–37
Article Google Scholar
Wu Y, Chang EY (2004) Distance-function design and fusion for sequence data. In: Proceedings of the thirteenth ACM conference on information and knowledge management—CIKM’04. ACM Press, New York, New York, USA, pp 324–333
Wu Y-L, Agrawal D, El Abbadi A (2000) A comparison of DFT and DWT based similarity search in time-series databases. In: Proceedings of the ninth international conference on Information and knowledge management—CIKM ’00. ACM Press, New York, New York, USA, pp 488–495
Xi X, Keogh E, Shelton C et al (2006) Fast time series classification using numerosity reduction. In: Proceedings of the 23rd international conference on machine learning—ICML’06. ACM Press, New York, New York, USA, pp 1033–1040
Ye L, Keogh E (2009) Time series shapelets. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining—KDD’09. ACM Press, New York, New York, USA, pp 947–956
Zavrel J (1997) An empirical re-examination of weighted voting for k-NN. In: Proceedings of the 7th Belgian–Dutch conference on machine learning. pp 139–148
Zhang H, Ho TB, Lin MS (2004) A non-parametric wavelet feature extractor for time series classification. In: Dai H, Srikant R, Zhang C (eds) Advances in knowledge discovery and data mining SE-71. Springer, Berlin, Heidelberg, pp 595–603
Chapter Google Scholar

Download references

Acknowledgments

The authors would like to thank Eamonn Keogh for collecting and making available the UCR time-series datasets, as well as everyone who contributed data to the collection, without whom the presented work would not have been possible. V. Kurbalija, M. Radovanović and M. Ivanović thank the Serbian Ministry of Education, Science and Technological Development for support through Project No. OI174023, “Intelligent Techniques and their Integration into Wide-Spectrum Decision Support.”

Author information

Authors and Affiliations

Department of Media Studies, Faculty of Philosophy, University of Novi Sad, Dr Zorana Ɖinđića 2, 21000, Novi Sad, Serbia
Zoltan Geler
Department of Mathematics and Informatics, Faculty of Sciences, University of Novi Sad, Trg D. Obradovića 4, 21000, Novi Sad, Serbia
Vladimir Kurbalija, Miloš Radovanović & Mirjana Ivanović

Authors

Zoltan Geler
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Kurbalija
View author publications
You can also search for this author in PubMed Google Scholar
Miloš Radovanović
View author publications
You can also search for this author in PubMed Google Scholar
Mirjana Ivanović
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zoltan Geler.

Appendix

Tables 20, 21, 22, 23 and 24 contain the classification errors and the values of parameter k obtained for the analyzed similarity measures (Euclidean distance, DTW, LCS, ERP and EDR). Due to lack of space, the values reported in the tables in the Appendix are shown rounded to three decimal places.

Table 20 Classification errors and the values of parameter k obtained for Euclidean distance

Full size table

Table 21 Classification errors and the values of the parameter k obtained for DTW

Full size table

Table 22 Classification errors and the values of the parameter k obtained for LCS

Full size table

Table 23 Classification errors and the values of the parameter k obtained for ERP

Full size table

Table 24 Classification errors and the values of the parameter k obtained for EDR

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Geler, Z., Kurbalija, V., Radovanović, M. et al. Comparison of different weighting schemes for the kNN classifier on time-series data. Knowl Inf Syst 48, 331–378 (2016). https://doi.org/10.1007/s10115-015-0881-0

Download citation

Received: 26 December 2014
Revised: 06 August 2015
Accepted: 14 September 2015
Published: 25 September 2015
Issue Date: August 2016
DOI: https://doi.org/10.1007/s10115-015-0881-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparison of different weighting schemes for the kNN classifier on time-series data

Abstract

Access this article

Similar content being viewed by others

Time series classification with ensembles of elastic distance measures

Impact of the Sakoe-Chiba Band on the DTW Time Series Distance Measure for kNN Classification

Using dynamic time warping distances as features for improved time series classification

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Comparison of different weighting schemes for the kNN classifier on time-series data

Abstract

Access this article

Similar content being viewed by others

Time series classification with ensembles of elastic distance measures

Impact of the Sakoe-Chiba Band on the DTW Time Series Distance Measure for kNN Classification

Using dynamic time warping distances as features for improved time series classification

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation