Shapelet Based Two-Step Time Series Positive and Unlabeled Learning

Zhang, Han-Bo; Wang, Peng; Zhang, Ming-Ming; Wang, Wei

doi:10.1007/s11390-022-1320-9

Shapelet Based Two-Step Time Series Positive and Unlabeled Learning

Regular Paper
Published: 30 November 2023

Volume 38, pages 1387–1402, (2023)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Han-Bo Zhang¹,
Peng Wang¹,
Ming-Ming Zhang¹ &
…
Wei Wang¹

117 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

In the last decade, there has been significant progress in time series classification. However, in real-world industrial settings, it is expensive and difficult to obtain high-quality labeled data. Therefore, the positive and unlabeled learning (PU-learning) problem has become more and more popular recently. The current PU-learning approaches of the time series data suffer from low accuracy due to the lack of negative labeled time series. In this paper, we propose a novel shapelet based two-step (2STEP) PU-learning approach. In the first step, we generate shapelet features based on the positive time series, which are used to select a set of negative examples. In the second step, based on both positive and negative time series, we select the final features and build the classification model. The experimental results show that our 2STEP approach can improve the average F1 score on 15 datasets by 9.1% compared with the baselines, and achieves the highest F1 score on 10 out of 15 time series datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Yeh C C M, Zhu Y, Ulanova L, Begum N, Ding Y F, Dau H A, Silva D F, Mueen A, Keogh E. Matrix profile I: All pairs similarity joins for time series: A unifying view that includes motifs, discords and shapelets. In Proc. the 16th International Conference on Data Mining, Dec. 2016, pp.1317–1322. DOI: 10.1109/ICDM.2016.0179.
Ye L X, Keogh E. Time series shapelets: A new primitive for data mining. In Proc. the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Jun. 2009, pp.947–956. DOI: 10.1145/1557019.1557122.
Paparrizos J, Gravano L. k-Shape: Efficient and accurate clustering of time series. In Proc. the 2015 ACM SIG-MOD International Conference on Management of Data, May 2015, pp.1855–1870. DOI: 10.1145/2723372.2737793.
Yeh C C M, Kavantzas N, Keogh E. Matrix profile IV: Using weakly labeled time series to predict outcomes. Proceedings of the VLDB Endowment, 2017, 10(12): 1802–1812. DOI: https://doi.org/10.14778/3137765.3137784.
Article Google Scholar
Chen Y P, Hu B, Keogh E, Batista G E A P A. DTW-D: Time series semi-supervised learning from a single example. In Proc. the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2013, pp.383–391. DOI: 10.1145/2487575.2487633.
Liang S, Zhang Y C, Ma J G. PU-Shapelets: Towards pattern-based positive unlabeled classification of time series. In Proc. the 24th International Conference on Database Systems for Advanced Applications, Apr. 2019, pp.87–103. DOI: 10.1007/978-3-030-18576-3_6.
Liu B, Dai Y, Li X, Lee W S, Yu P S. Building text classifiers using positive and unlabeled examples. In Proc. the 3rd IEEE International Conference on Data Mining, Nov. 2003, pp.179–186. DOI: 10.1109/ICDM.2003.1250918.
Ratanamahatana C A, Wanichsan D. Stopping criterion selection for efficient semi-supervised time series classification. In Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, Lee R (ed.), Springer, 2008, pp.1–14. DOI: https://doi.org/10.1007/978-3-540-70560-4_1.
Begum N, Hu B, Rakthanmanon T, Keogh E. Towards a minimum description length based stopping criterion for semi-supervised time series classification. In Proc. the 14th International Conference on Information Reuse & Integration, Aug. 2013, pp.333–340. DOI: 10.1109/IRI.2013.6642490.
González M, Bergmeir C, Triguero I, Rodríguez Y, Benítez J M. On the stopping criteria for k-Nearest Neighbor in positive unlabeled time series classification problems. Information Sciences, 2016, 328: 42–59. DOI: https://doi.org/10.1016/j.ins.2015.07.061.
Article Google Scholar
Elkan C, Noto K. Learning classifiers from only positive and unlabeled data. In Proc. the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2008, pp.213–220. DOI: 10.1145/1401890.1401920.
Plessis M C, Niu G, Sugiyama M. Convex formulation for learning from positive and unlabeled data. In Proc. the 32nd International Conference on Machine Learning, Jul. 2015, pp.1386–1394.
Ling C X, Sheng V S. Cost-sensitive learning. In Encyclopedia of Machine Learning, Sammut C, Webb G I (eds.), Springer, 2010, pp.231–235. DOI: https://doi.org/10.1007/978-0-387-30164-8_181.
Kiryo R, Niu G, du Plessis M C, Sugiyama M. Positiveunlabeled learning with non-negative risk estimator. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.1674–1684.
Zhang C, Ren D X, Liu T L, Yang J, Gong C. Positive and unlabeled learning with label disambiguation. In Proc. the 28th International Joint Conference on Artificial Intelligence, Aug. 2019, pp.4250–4256. DOI: 10.24963/ijcai.2019/590.
Gong C, Shi H, Liu T L, Zhang C, Yang J, Tao D C. Loss decomposition and centroid estimation for positive and unlabeled learning. IEEE Trans. Pattern Analysis and Machine Intelligence, 2021, 43(3): 918–932. DOI: https://doi.org/10.1109/TPAMI.2019.2941684.
Gong C, Shi H, Yang J, Yang J. Multi-manifold positive and unlabeled learning for visual analysis. IEEE Trans. Circuits and Systems for Video Technology, 2020, 30(5): 1396–1409. DOI: https://doi.org/10.1109/TCSVT.2019.2903563.
Article Google Scholar
Gong C, Liu T L, Yang J, Tao D C. Large-margin labelcalibrated support vector machines for positive and unlabeled learning. IEEE Trans. Neural Networks and Learning Systems, 2019, 30(11): 3471–3483. DOI: https://doi.org/10.1109/TNNLS.2019.2892403.
Article Google Scholar
Li X L, Liu B. Learning to classify texts using positive and unlabeled data. In Proc. the 18th International Joint Conference on Artificial Intelligence, Aug. 2003, pp.587–592.
Liu B, Lee W S, Yu P S, Li X L. Partially supervised classification of text documents. In Proc. the 19th International Conference on Machine Learning, Jul. 2002, pp.387–394.
Zhang B Z, Zuo W L. Reliable negative extracting based on kNN for learning from positive and unlabeled examples. Journal of Computers, 2009, 4(1): 94–101. DOI: https://doi.org/10.4304/jcp.4.1.94-101.
Article MathSciNet Google Scholar
Fung G P C, Yu J X, Lu H J, Yu P S. Text classification without negative examples revisit. IEEE Trans. Knowledge and Data Engineering, 2006, 18(1): 6–20. DOI: https://doi.org/10.1109/TKDE.2006.16.
Article Google Scholar
Zhu Y, Zimmerman Z, Senobari N S, Yeh C C M, Funning G, Mueen A, Brisk P, Keogh E. Matrix profile II: Exploiting a novel algorithm and GPUs to break the one hundred million barrier for time series motifs and joins. In Proc. the 16th International Conference on Data Mining, Dec. 2016, pp.739–748. DOI: 10.1109/ICDM.2016.0085.
Linardi M, Zhu Y, Palpanas T, Keogh E. Matrix profile X: VALMOD—Scalable discovery of variable-length motifs in data series. In Proc. the 2018 International Conference on Management of Data, May 2018, pp.1053–1066. DOI: 10.1145/3183713.3183744.
Kantrowitz M, Mohit B, Mittal V. Stemming and its effects on TFIDF ranking (poster session). In Proc. the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 2000, pp.357–359. DOI: 10.1145/345508.345650.
Ito Y. Approximation of continuous functions on Rd by linear combinations of shifted rotations of a sigmoid function with and without scaling. Neural Networks, 1992, 5(1): 105–115. DOI: https://doi.org/10.1016/S0893-6080(05)80009-7.
Article Google Scholar
Mueen A, Keogh E, Young N. Logical-shapelets: An expressive primitive for time series classification. In Proc. the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2011, pp.1154–1162. DOI: 10.1145/2020408.2020587.
Kumar R, Bishnu P S. Identification of k-most promising features to set blue ocean strategy in decision making. Data Science and Engineering, 2019, 4(4): 367–384. DOI: https://doi.org/10.1007/s41019-019-00106-z.
Article Google Scholar
Ye J P, Janardan R, Li Q, Park H. Feature extraction via generalized uncorrelated linear discriminant analysis. In Proc. the 21st International Conference on Machine Learning, Jul. 2004. DOI: 10.1145/1015330.1015348, Nov. 2023.
Chen Y P, Keogh E, Hu B, Begum N, Bagnall A, Mueen A, Batista G. The UCR time series classification archive, 2015. https://www.cs.ucr.edu/~eamonn/time_series_data/, November 2023.

Download references

Author information

Authors and Affiliations

School of Computer Science, Fudan University, Shanghai, 200438, China
Han-Bo Zhang, Peng Wang, Ming-Ming Zhang & Wei Wang

Authors

Han-Bo Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Peng Wang
View author publications
You can also search for this author inPubMed Google Scholar
Ming-Ming Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Wei Wang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Peng Wang.

Supplementary Information

ESM 1

(PDF 321 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, HB., Wang, P., Zhang, MM. et al. Shapelet Based Two-Step Time Series Positive and Unlabeled Learning. J. Comput. Sci. Technol. 38, 1387–1402 (2023). https://doi.org/10.1007/s11390-022-1320-9

Download citation

Received: 25 January 2021
Accepted: 22 December 2022
Published: 30 November 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s11390-022-1320-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Shapelet Based Two-Step Time Series Positive and Unlabeled Learning

Abstract

Access this article

Subscribe and save

Buy Now

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now