Skip to main content
Log in

Knowledge discovery and variable scale evaluation for long series data

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Knowledge discovery and evaluation is a challenging but rewarding process of obtaining available information automatically from database. Due to the heterogeneity of the collected data, the connotative knowledge has the characteristics of uncertainty, random occurrence and variable scale. Therefore, an unsupervised knowledge discovery and variable scale evaluation model is presented in this paper based on a new multi-feature fusion method. Firstly, point at the multiple information features, an amplitude-frequency-shape based state description form is proposed in this paper. It could analyze the time series from the aspects of energy, phase, and knowledge similarity. In view of the variable number and scale of knowledge fragments, a piecewise linear segmentation criterion is put forward based on the complexity and accuracy of information representation. Then a model free knowledge discovery framework without samples labels is constructed to discover the knowledge quickly and effectively. Aimed at the variable knowledge scale, a variable scale evaluation method is first proposed to distinguish the multi-scale decision-making knowledge based on the indicators of system stability and security. It could optimize the knowledge base and guide the decision-making process. The experimental results on heterogeneous activity datasets indicate that the proposed method here could generally analysis the time series state and discover the knowledge efficiently from massive data. In addition, the knowledge discovery and evaluation at a continuous decision system show that the proposed framework could meet the needs of knowledge discovery in complex environment and effectively distinguish the knowledge to provide strong support for establishing a credible decision-making system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

The UCI datasets analyzed during the current study are available from UCI Machine Learning Repository (http://archive.ics.uci.edu/ml). The blast furnace gas scheduling activities dataset that support the findings of this study is not openly available due to the sharing agreement with the enterprise.

References

  • Alzubaidi A, Tepper J, Lotfi A (2020) A novel deep mining model for effective knowledge discovery from omics data. Artif Intell Med 104:101821

    Article  Google Scholar 

  • Azami H, Escudero J (2016) Amplitude-aware permutation entropy: Illustration in spike detection and signal segmentation. Comput Methods Programs Biomed 128:40–51

    Article  Google Scholar 

  • Baccigalupi A, Liccardo A (2016) The Huang Hilbert Transform for evaluating the instantaneous frequency evolution of transient signals in non-linear systems. Measurement 86:1–13

    Article  Google Scholar 

  • Bagnall A, Lines J, Hills J, Bostrom A (2015) Time-Series Classification with COTE: The collective of transformation-based ensembles. IEEE Trans Knowl Data Eng 27(9):2522–2535

    Article  Google Scholar 

  • Bao C, Wu D, Li J (2019) A knowledge-based risk measure from the fuzzy multicriteria decision-making perspective. IEEE Trans Fuzzy Syst 27(5):1126–1138

    Article  Google Scholar 

  • Bhattacharyya A, Pachori RB (2017) A multivariate approach for patient-specific EEG seizure detection using empirical wavelet transform. IEEE Trans Biomed Eng 64(9):2003–2015

    Article  Google Scholar 

  • Casale P, Pujol O, Radeva P (2012) Personalization and user verification in wearable systems using biometric walking patterns. Pers Ubiquit Comput 16(5):563–580

    Article  Google Scholar 

  • Cuesta-Frau D (2019) Permutation entropy: Influence of amplitude information on time series classification performance. Math Biosci Eng 16(6):6842–6857

    Article  MATH  Google Scholar 

  • Cuesta-Frau D, Miró-Martínez P, Oltra-Crespo S, Jordán-Núñez J, Vargas B, González P, Varela-Entrecanales M (2018) Model selection for body temperature signal classification using both amplitude and ordinality-based entropy measures. Entropy 20(11):853

    Article  Google Scholar 

  • Deldari S, Smith DV, Sadri A, Salim F (2020) ESPRESSO: entropy and shape aware time-series segmentation for processing heterogeneous sensor data. Proc ACM Interact Mobile Wearable Ubiquitous Technol 4(3):1–24

    Article  Google Scholar 

  • Gao Y, Lin J (2018) Exploring variable-length time series motifs in one hundred million length scale. Data Min Knowl Disc 32(5):1200–1228

    Article  MathSciNet  Google Scholar 

  • Gharghabi S, Yeh CM, Ding Y, Ding W, Hibbing P, LaMunion S, Kaplan A, Crouter SE, Keogh E (2019) Domain agnostic online semantic segmentation for multi-dimensional time series. Data Min Knowl Disc 33(1):96–130

    Article  MathSciNet  MATH  Google Scholar 

  • Goldberger A, Amaral L, Glass L, Hausdorff J, Ivanov P C, Mark R et al (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online] 101(23):e215–e220

  • Grabocka J, Schilling N, Wistuba M, Schmidt-Thieme L (2014) Learning time-series shapelets. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD ’14). Association for Computing Machinery, New York, pp 392–401

  • Gupta A, Gupta HP, Biswas B, Dutta T (2021) A fault-tolerant early classification approach for human activities using multivariate time series. IEEE Trans Mob Comput 20(5):1747–1760

    Article  Google Scholar 

  • He Y, Guo J, Zheng X (2018) From surveillance to digital twin: challenges and recent advances of signal processing for industrial internet of things. IEEE Signal Process Mag 35(5):120–129

    Article  Google Scholar 

  • Imani S, Alaee S, Keogh E (2019) Putting the human in the time series analytics loop. In: Companion proceedings of the 2019 worldwideweb conference, San Francisco, CA, USA, 13–17 May 2019, pp 635–644

  • Kaluža B, Mirchevska V, Dovgan E, Luštrek M, Gams M (2010) An agent-based approach to care in independent living. In: International joint conference on ambient intelligence (AmI-2010), vol 6439. Springer, Berlin, pp 177–186

  • Leles MCR, Sansão JPH, Mozelli LA, Guimarãesd HN (2018) Improving reconstruction of time-series based in Singular Spectrum Analysis: a segmentation approach. Digital Signal Process 77:63–76

    Article  MathSciNet  Google Scholar 

  • Li G, Choi BKK, Xu J, Bhowmick SS, Chun K, Wong GL (2020) Efficient shapelet discovery for time series classification. IEEE Trans Knowl Data Eng 34(3):1149–1163

    Article  Google Scholar 

  • Liu L, Wang S, Hu B, Qiong Q, Wen J, Rosenblume DS (2018) Learning structures of interval-based Bayesian networks in probabilistic generative model for human complex activity recognition. Pattern Recogn 81:545–561

    Article  Google Scholar 

  • Lv Z, Zhao J, Liu Y, Wang W (2016) Use of a quantile regression based echo state network ensemble for construction of prediction intervals of gas flow in a blast furnace. Control Eng Pract 46:94–104

    Article  Google Scholar 

  • Nancy JY, Khanna NH (2017) A bio-statistical mining approach for classifying multivariate clinical time series data observed at irregular intervals. Expert Syst Appl 78:283–300

    Article  Google Scholar 

  • Nguyen TL, Gsponer S, Ilie I, O’Reilly M, Ifrim G (2019) Interpretable time series classification using linear models and multi-resolution multi-domain symbolic representations. Data Min Knowl Disc 33:1183–1222

    Article  MathSciNet  MATH  Google Scholar 

  • Park H, Jae-Yoon J (2020) SAX-ARM: deviant event pattern discovery from multivariate time series using symbolic aggregate approximation and association rule mining. Expert Syst Appl 141:112950

    Article  Google Scholar 

  • Patel SP, Upadhyay SH (2020) Euclidean distance based feature ranking and subset selection for bearing fault diagnosis. Expert Syst Appl 154:113400

    Article  Google Scholar 

  • Pradhan GN, Prabhakaran B (2017) Association rule mining in multiple, multidimensional time series medical data. J Healthc Inf Res 1(1):92–118

    Article  Google Scholar 

  • Reiss A, Stricker D (2012) Introducing a new benchmarked dataset for activity monitoring. In: The 16th IEEE international symposium on wearable computers (ISWC), Newcastle, UK, 18–22 June 2012, pp 108–109

  • Sadri A, Ren Y, Salim FD (2017) Information gain-based metric for recognizing transitions in human activities. Pervasive Mob Comput 38:92–109

    Article  Google Scholar 

  • Sánchez P, Bellogín A (2020) Applying reranking strategies to route recommendation using sequence-aware evaluation. User Model User Adapt Interact 30(3):659–725

    Article  Google Scholar 

  • Serrà J, Serra I, Corral Á, LluisArcos J (2016) Ranking and significance of variable-length similarity-based time series motifs. Expert Syst Appl 55:452–460

    Article  Google Scholar 

  • Stisen A, Blunck H, Bhattacharya S, Prentow TS, Kjærgaard MB, Dey A, Sonne T, Jensen MM (2015) Smart devices are different: Assessing and mitigatingmobile sensing heterogeneities for activity recognition. In: Proceedings of the 13th ACM conference on embedded networked sensor systems, Seoul South Korea, 1–4 November 2015, pp 127–140

  • Thuy HTT, Anh DT, Chau VTN (2017) Comparing three time series segmentation methods via novel evaluation criteria. In: 2017 2nd International conferences on information technology, information systems and electrical engineering (ICITISEE), Yogyakarta, Indonesia, 1–2 November 2017, pp 171–176

  • Wang H, Zhang Q, Wu J, Pan S, Chen Y (2019) Time series feature learning with labeled and unlabeled data. Pattern Recogn 89:55–66

    Article  Google Scholar 

  • Yamaguchi A, Ueno K (2021) Learning time-series shapelets via supervised feature selection. In: Proceedings of the 2021 SIAM international conference on data mining (SDM). Society for Industrial and Applied Mathematics, Alexandria, VA, USA, pp 262–270

  • Yan B, Wang B, Zhou F, Li W, Xu B (2018) Sparse decomposition method based on time–frequency spectrum segmentation for fault signals in rotating machinery. ISA Trans 83:142–153

    Article  Google Scholar 

  • Yu J, Liu G (2020) Knowledge-based deep belief network for machining roughness prediction and knowledge discovery. Comput Ind 121:103262

    Article  Google Scholar 

  • Zhai Y, Lv Z, Zhao J, Wang W, Leung H (2022) Associative reasoning-based interpretable continuous decision making for long series data of industrial production process. Expert Syst Appl 204:117585

    Article  Google Scholar 

  • Zhao J, Itti L (2016) Classifying time series using local descriptors with hybrid sampling. IEEE Trans Knowl Data Eng 28(3):623–637

    Article  Google Scholar 

  • Zhao J, Wang W, Sun K, Liu Y (2014) A Bayesian networks structure learning and reasoning-based byproduct gas scheduling in steel industry. IEEE Trans Autom Sci Eng 11(4):1149–1154

    Article  Google Scholar 

Download references

Acknowledgements

The authors wish to thank the Associate Editor and the anonymous reviewers for their valuable comments and constructive suggestions, which helped improve the presentation of the paper. This work was supported by the National Key R&D Program of China under Grant 2017YFA0700300, the National Natural Sciences Foundation of China under Grant 61833003, Grant 61873048, Grant U1908218, the Fundamental Research Funds for the Central Universities under Grant DUT22JC16.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zheng Lv.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhai, Y., Lv, Z., Zhao, J. et al. Knowledge discovery and variable scale evaluation for long series data. Artif Intell Rev 56, 3157–3180 (2023). https://doi.org/10.1007/s10462-022-10250-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-022-10250-0

Keywords

Navigation