INSIGHT: Efficient and Effective Instance Selection for Time-Series Classification

Buza, Krisztian; Nanopoulos, Alexandros; Schmidt-Thieme, Lars

doi:10.1007/978-3-642-20847-8_13

Krisztian Buza²²,
Alexandros Nanopoulos²² &
Lars Schmidt-Thieme²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6635))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2488 Accesses
23 Citations

Abstract

Time-series classification is a widely examined data mining task with various scientific and industrial applications. Recent research in this domain has shown that the simple nearest-neighbor classifier using Dynamic Time Warping (DTW) as distance measure performs exceptionally well, in most cases outperforming more advanced classification algorithms. Instance selection is a commonly applied approach for improving efficiency of nearest-neighbor classifier with respect to classification time. This approach reduces the size of the training set by selecting the best representative instances and use only them during classification of new instances. In this paper, we introduce a novel instance selection method that exploits the hubness phenomenon in time-series data, which states that some few instances tend to be much more frequently nearest neighbors compared to the remaining instances. Based on hubness, we propose a framework for score-based instance selection, which is combined with a principled approach of selecting instances that optimize the coverage of training data. We discuss the theoretical considerations of casting the instance selection problem as a graph-coverage problem and analyze the resulting complexity. We experimentally compare the proposed method, denoted as INSIGHT, against FastAWARD, a state-of-the-art instance selection method for time series. Our results indicate substantial improvements in terms of classification accuracy and drastic reduction (orders of magnitude) in execution times.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Machine Learning 6(1), 37–66 (1991)
Google Scholar
Brighton, H., Mellish, C.: Advances in Instance Selection for Instance-Based Learning Algorithms. Data Mining and Knowledge Discovery 6, 153–172 (2002)
Article MATH Google Scholar
Buza, K., Nanopoulos, A., Schmidt-Thieme, L.: Time-Series Classification based on Individualised Error Prediction. In: IEEE CSE 2010 (2010)
Google Scholar
Chakrabarti, K., Keogh, E., Sharad, M., Pazzani, M.: Locally adaptive dimensionality reduction for indexing large time series databases. ACM Transactions on Database Systems 27, 188–228 (2002)
Article Google Scholar
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2001)
MATH Google Scholar
Ding, H., Trajcevski, G., Scheuermann, P., Wang, X., Keogh, E.: Querying and Mining of Time Series Data: Experimental Comparison of Representations and Distance Measures. In: VLDB 2008 (2008)
Google Scholar
Gunopulos, D., Das, G.: Time series similarity measures and time series indexing. ACM SIGMOD Record 30, 624 (2001)
Article Google Scholar
Jankowski, N., Grochowski, M.: Comparison of instances seletion algorithms I. Algorithms survey. In: Rutkowski, L., Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds.) ICAISC 2004. LNCS (LNAI), vol. 3070, pp. 598–603. Springer, Heidelberg (2004)
Chapter Google Scholar
Jankowski, N., Grochowski, M.: Comparison of instance selection algorithms II. Results and Comments. In: Rutkowski, L., Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds.) ICAISC 2004. LNCS (LNAI), vol. 3070, pp. 580–585. Springer, Heidelberg (2004)
Chapter Google Scholar
Keogh, E.: Exact indexing of dynamic time warping. In: VLDB 2002 (2002)
Google Scholar
Keogh, E., Kasetty, S.: On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration. In: SIGKDD (2002)
Google Scholar
Ougiaroglou, S., Nanopoulos, A., Papadopoulos, A.N., Manolopoulos, Y., Welzer-Druzovec, T.: Adaptive k-Nearest-Neighbor Classification Using a Dynamic Number of Nearest Neighbors. In: Ioannidis, Y., Novikov, B., Rachev, B. (eds.) ADBIS 2007. LNCS, vol. 4690, pp. 66–82. Springer, Heidelberg (2007)
Chapter Google Scholar
Lin, J., Keogh, E., Lonardi, S., Chiu, B.: A Symbolic Representation of Time Series, with Implications for Streaming Algorithms. In: Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (2003)
Google Scholar
Liu, H., Motoda, H.: On Issues of Instance Selection. Data Mining and Knowledge Discovery 6, 115–130 (2002)
Article Google Scholar
Radovanovic, M., Nanopoulos, A., Ivanovic, M.: Nearest Neighbors in High-Dimensional Data: The Emergence and Influence of Hubs. In: ICML 2009 (2009)
Google Scholar
Radovanovic, M., Nanopoulos, A., Ivanovic, M.: Time-Series Classification in Many Intrinsic Dimensions. In: 10th SIAM International Conference on Data Mining (2010)
Google Scholar
Ratanamahatana, C.A., Keogh, E.: Three myths about Dynamic Time Warping. In: SDM (2005)
Google Scholar
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoustics, Speech and Signal Proc. 26, 43–49 (1978)
Article MATH Google Scholar
Wettschereck, D., Dietterich, T.: Locally Adaptive Nearest Neighbor Algorithms. Advances in Neural Information Processing Systems 6 (1994)
Google Scholar
Xi, X., Keogh, E., Shelton, C., Wei, L., Ratanamahatana, C.A.: Fast Time Series Classification Using Numerosity Reduction. In: Airoldi, E.M., Blei, D.M., Fienberg, S.E., Goldenberg, A., Xing, E.P., Zheng, A.X. (eds.) ICML 2006. LNCS, vol. 4503. Springer, Heidelberg (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
Krisztian Buza, Alexandros Nanopoulos & Lars Schmidt-Thieme

Authors

Krisztian Buza
View author publications
You can also search for this author in PubMed Google Scholar
Alexandros Nanopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Lars Schmidt-Thieme
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Shenzhen Institutes of Advanced Technology (SIAT), Chinese Academy of Sciences, 518055, Shenzhen, China
Joshua Zhexue Huang
Faculty of Engineering and Information Technology, Center for Quantum Computation and Intelligent Systems, Data Sciences and Knowledge Discovery Lab, University of Technology Sydney, 2007, Sydney, NSW, Australia
Longbing Cao
Department of Computer Science and Engineering, University of Minnesota, 55455, Minneapolis, MN, USA
Jaideep Srivastava

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Buza, K., Nanopoulos, A., Schmidt-Thieme, L. (2011). INSIGHT: Efficient and Effective Instance Selection for Time-Series Classification. In: Huang, J.Z., Cao, L., Srivastava, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2011. Lecture Notes in Computer Science(), vol 6635. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20847-8_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-20847-8_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20846-1
Online ISBN: 978-3-642-20847-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics