Early Classification of Multivariate Time Series on Distributed and In-Memory Platforms

Tseng, Vincent S.; Huang, Huai-Shuo; Huang, Chia-Wei; Wang, Ping-Feng; Li, Chu-Feng

doi:10.1007/978-3-319-67274-8_1

Early Classification of Multivariate Time Series on Distributed and In-Memory Platforms

Vincent S. Tseng¹⁷,
Huai-Shuo Huang¹⁷,
Chia-Wei Huang¹⁷,
Ping-Feng Wang¹⁸ &
…
Chu-Feng Li¹⁸

Conference paper
First Online: 07 October 2017

947 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10526))

Abstract

With the popularity of Internet of Things (IOT) applications, various kinds of active sensors are deployed and multivariate time series datasets are generated rapidly. Early classification of multivariate time series is an emerging topic in data mining due to the wide applications in many domains. The unique part of early classification lies in that it uses only earlier part of time series data to reach classification results with the same accuracy as by methods using complete time series information. Although a number of relevant studies have been presented recently, most of them didn’t consider the issues of data scale and execution efficiency simultaneously. The main research issue of this paper falls in how to mine interpretable patterns from multivariate time series data, with which effective classification models can be constructed to ensure the accuracy as well as earliness. To take into account the issues of data scale and execution efficiency simultaneously, we explore distributed in-memory computing techniques and multivariate shapelets-based approaches to construct a Spark-based in-memory mining framework to parallelize the feature extraction process. We implement a framework with Multivariate Shapelets Detection (MSD) method as a based example. Through empirical evaluation on various kinds of sensory datasets, the scalability of the framework is evaluated in terms of efficiency while ensuring the same accuracy and reliability in early classification of multivariate time series. This work is the first one to realize multivariate time series early classification on Spark distributed in-memory computing platform, which can serve as a good base for an extension to other time series classification methods based on shapelet feature extraction.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Apache Hadoop. http://hadoop.apache.org/
Apache HBase. http://hbase.apache.org/
Bache, K., Lichman, M.: UCI machine learning repository (2013)
Google Scholar
Bregón, A., Simón, M.A., Rodríguez, J.J., Alonso, C., Pulido, B., Moro, I.: Early fault classification in dynamic systems using case-based reasoning. In: Marín, R., Onaindía, E., Bugarín, A., Santos, J. (eds.) CAEPIA 2005. LNCS, vol. 4177, pp. 211–220. Springer, Heidelberg (2006). doi:10.1007/11881216_23
Chapter Google Scholar
Dachraoui, A., Bondu, A., Cornuéjols, A.: Early classification of time series as a non myopic sequential decision making problem. In: Appice, A., Rodrigues, P., Santos Costa, V., Soares, C., Gama, J., Jorge, A. (eds.) ECML PKDD 2015. LNCS, vol. 4177, pp. 433–447. Springer, Cham (2015). doi:10.1007/978-3-319-23528-8_27
Chapter Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Ding, H., Trajcevski, G., Scheuermann, P., Wang, X., Keogh, E.: Querying and mining of time series data: experimental comparison of representations and distance measures. Proc. VLDB Endow. 1(2), 1542–1552 (2008)
Article Google Scholar
Gates, A.F., Natkovich, O., Chopra, S., Kamath, P., Narayanamurthy, S.M., Olston, C., Reed, B., Srinivasan, S., Srivastava, U.: Building a high-level dataflow system on top of Map-Reduce: the Pig experience. Proc. VLDB Endow. 2(2), 1414–1425 (2009)
Article Google Scholar
Ghalwash, M.F., Obradovic, Z.: Early classification of multivariate temporal observations by extraction of interpretable shapelets. BMC Bioinform. 13(1), 1 (2012)
Article Google Scholar
Ghalwash, M.F., Ramljak, D., Obradović, Z.: Early classification of multivariate time series using a hybrid HMM/SVM model. In: 2012 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1–6. IEEE, October 2012
Google Scholar
Ghalwash, M.F., Radosavljevic, V., Obradovic, Z.: Extraction of interpretable multivariate patterns for early diagnostics. In: 2013 IEEE 13th International Conference on Data Mining (ICDM), pp. 201–210. IEEE, December 2013
Google Scholar
Ghalwash, M.F., Radosavljevic, V., Obradovic, Z.: Utilizing temporal patterns for estimating uncertainty in interpretable early decision making. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 402–411. ACM, August 2014
Google Scholar
He, G., Duan, Y., Peng, R., Jing, X., Qian, T., Wang, L.: Early classification on multivariate time series. Neurocomputing 149, 777–787 (2015)
Article Google Scholar
Junqueira, F.P., Reed, B.C.: The life and times of a zookeeper. In: Proceedings of the 28th ACM Symposium on Principles of Distributed Computing, p. 4. ACM, August 2009
Google Scholar
Lin, Y.F., Chen, H.H., Tseng, V.S., Pei, J.: Reliable early classification on multivariate time series with numerical and categorical attributes. In: Cao, T., Lim, E.P., Zhou, Z.H., Ho, T.B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS, vol. 9077, pp. 199–211. Springer, Cham (2015). doi:10.1007/978-3-319-18038-0_16
Google Scholar
Lines, J., Davis, L.M., Hills, J., Bagnall, A.: A shapelet transform for time series classification. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 289–297. ACM, August 2012
Google Scholar
Mueen, A., Keogh, E., Young, N.: Logical-shapelets: an expressive primitive for time series classification. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1154–1162. ACM, August 2011
Google Scholar
Olszewski, R.T.: Generalized feature extraction for structural pattern recognition in time-series data (No. CMU-CS-01-108). Carnegie-Mellon University Pittsburgh, PA School of Computer Science (2001)
Google Scholar
Rodríguez, J.J., Alonso, C.J., Boström, H.: Boosting interval based literals. Intell. Data Anal. 5(3), 245–262 (2001)
MATH Google Scholar
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE, May 2010
Google Scholar
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow. 2(2), 1626–1629 (2009)
Article Google Scholar
Xi, X., Keogh, E., Shelton, C., Wei, L., Ratanamahatana, C.A.: Fast time series classification using numerosity reduction. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 1033–1040. ACM, June 2006
Google Scholar
Xing, Z., Pei, J., Dong, G., Philip, S.Y.: Mining sequence classifiers for early prediction. In: SDM, pp. 644–655, April 2008
Google Scholar
Xing, Z., Pei, J., Philip, S.Y.: Early prediction on time series: a nearest neighbor approach. In: IJCAI, pp. 1297–1302, July 2009
Google Scholar
Xing, Z., Pei, J., Philip, S.Y., Wang, K.: Extracting interpretable features for early classification on time series. In: SDM, vol. 11, pp. 247–258, April 2011
Google Scholar
Ye, L., Keogh, E.: Time series shapelets: a new primitive for data mining. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 947–956. ACM, June 2009
Google Scholar
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, p. 2. USENIX Association, April 2012
Google Scholar

Download references

Acknowledgement

This study was conducted under the “Complex Event Processing System” project of the Institute for Information Industry which is subsidized by the Ministry of Economic Affairs of the Republic of China.

Author information

Authors and Affiliations

National Chiao Tung University, Hsinchu, Taiwan, Republic of China
Vincent S. Tseng, Huai-Shuo Huang & Chia-Wei Huang
Institute for Information Industry, Taipei, Taiwan, Republic of China
Ping-Feng Wang & Chu-Feng Li

Authors

Vincent S. Tseng
View author publications
You can also search for this author in PubMed Google Scholar
Huai-Shuo Huang
View author publications
You can also search for this author in PubMed Google Scholar
Chia-Wei Huang
View author publications
You can also search for this author in PubMed Google Scholar
Ping-Feng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chu-Feng Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vincent S. Tseng .

Editor information

Editors and Affiliations

Seoul National University, Seoul, Korea (Republic of)
U Kang
School of Information Systems, Singapore Management University, Singapore, Singapore
Ee-Peng Lim
Chinese University of Hong Kong, Hong Kong, China
Jeffrey Xu Yu
Kangwon National University, Chuncheon, Korea (Republic of)
Yang-Sae Moon

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tseng, V.S., Huang, HS., Huang, CW., Wang, PF., Li, CF. (2017). Early Classification of Multivariate Time Series on Distributed and In-Memory Platforms. In: Kang, U., Lim, EP., Yu, J., Moon, YS. (eds) Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2017. Lecture Notes in Computer Science(), vol 10526. Springer, Cham. https://doi.org/10.1007/978-3-319-67274-8_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-67274-8_1
Published: 07 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67273-1
Online ISBN: 978-3-319-67274-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics