Abstract
Knowledge discovery and evaluation is a challenging but rewarding process of obtaining available information automatically from database. Due to the heterogeneity of the collected data, the connotative knowledge has the characteristics of uncertainty, random occurrence and variable scale. Therefore, an unsupervised knowledge discovery and variable scale evaluation model is presented in this paper based on a new multi-feature fusion method. Firstly, point at the multiple information features, an amplitude-frequency-shape based state description form is proposed in this paper. It could analyze the time series from the aspects of energy, phase, and knowledge similarity. In view of the variable number and scale of knowledge fragments, a piecewise linear segmentation criterion is put forward based on the complexity and accuracy of information representation. Then a model free knowledge discovery framework without samples labels is constructed to discover the knowledge quickly and effectively. Aimed at the variable knowledge scale, a variable scale evaluation method is first proposed to distinguish the multi-scale decision-making knowledge based on the indicators of system stability and security. It could optimize the knowledge base and guide the decision-making process. The experimental results on heterogeneous activity datasets indicate that the proposed method here could generally analysis the time series state and discover the knowledge efficiently from massive data. In addition, the knowledge discovery and evaluation at a continuous decision system show that the proposed framework could meet the needs of knowledge discovery in complex environment and effectively distinguish the knowledge to provide strong support for establishing a credible decision-making system.
Similar content being viewed by others
Data availability
The UCI datasets analyzed during the current study are available from UCI Machine Learning Repository (http://archive.ics.uci.edu/ml). The blast furnace gas scheduling activities dataset that support the findings of this study is not openly available due to the sharing agreement with the enterprise.
References
Alzubaidi A, Tepper J, Lotfi A (2020) A novel deep mining model for effective knowledge discovery from omics data. Artif Intell Med 104:101821
Azami H, Escudero J (2016) Amplitude-aware permutation entropy: Illustration in spike detection and signal segmentation. Comput Methods Programs Biomed 128:40–51
Baccigalupi A, Liccardo A (2016) The Huang Hilbert Transform for evaluating the instantaneous frequency evolution of transient signals in non-linear systems. Measurement 86:1–13
Bagnall A, Lines J, Hills J, Bostrom A (2015) Time-Series Classification with COTE: The collective of transformation-based ensembles. IEEE Trans Knowl Data Eng 27(9):2522–2535
Bao C, Wu D, Li J (2019) A knowledge-based risk measure from the fuzzy multicriteria decision-making perspective. IEEE Trans Fuzzy Syst 27(5):1126–1138
Bhattacharyya A, Pachori RB (2017) A multivariate approach for patient-specific EEG seizure detection using empirical wavelet transform. IEEE Trans Biomed Eng 64(9):2003–2015
Casale P, Pujol O, Radeva P (2012) Personalization and user verification in wearable systems using biometric walking patterns. Pers Ubiquit Comput 16(5):563–580
Cuesta-Frau D (2019) Permutation entropy: Influence of amplitude information on time series classification performance. Math Biosci Eng 16(6):6842–6857
Cuesta-Frau D, Miró-Martínez P, Oltra-Crespo S, Jordán-Núñez J, Vargas B, González P, Varela-Entrecanales M (2018) Model selection for body temperature signal classification using both amplitude and ordinality-based entropy measures. Entropy 20(11):853
Deldari S, Smith DV, Sadri A, Salim F (2020) ESPRESSO: entropy and shape aware time-series segmentation for processing heterogeneous sensor data. Proc ACM Interact Mobile Wearable Ubiquitous Technol 4(3):1–24
Gao Y, Lin J (2018) Exploring variable-length time series motifs in one hundred million length scale. Data Min Knowl Disc 32(5):1200–1228
Gharghabi S, Yeh CM, Ding Y, Ding W, Hibbing P, LaMunion S, Kaplan A, Crouter SE, Keogh E (2019) Domain agnostic online semantic segmentation for multi-dimensional time series. Data Min Knowl Disc 33(1):96–130
Goldberger A, Amaral L, Glass L, Hausdorff J, Ivanov P C, Mark R et al (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online] 101(23):e215–e220
Grabocka J, Schilling N, Wistuba M, Schmidt-Thieme L (2014) Learning time-series shapelets. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD ’14). Association for Computing Machinery, New York, pp 392–401
Gupta A, Gupta HP, Biswas B, Dutta T (2021) A fault-tolerant early classification approach for human activities using multivariate time series. IEEE Trans Mob Comput 20(5):1747–1760
He Y, Guo J, Zheng X (2018) From surveillance to digital twin: challenges and recent advances of signal processing for industrial internet of things. IEEE Signal Process Mag 35(5):120–129
Imani S, Alaee S, Keogh E (2019) Putting the human in the time series analytics loop. In: Companion proceedings of the 2019 worldwideweb conference, San Francisco, CA, USA, 13–17 May 2019, pp 635–644
Kaluža B, Mirchevska V, Dovgan E, Luštrek M, Gams M (2010) An agent-based approach to care in independent living. In: International joint conference on ambient intelligence (AmI-2010), vol 6439. Springer, Berlin, pp 177–186
Leles MCR, Sansão JPH, Mozelli LA, Guimarãesd HN (2018) Improving reconstruction of time-series based in Singular Spectrum Analysis: a segmentation approach. Digital Signal Process 77:63–76
Li G, Choi BKK, Xu J, Bhowmick SS, Chun K, Wong GL (2020) Efficient shapelet discovery for time series classification. IEEE Trans Knowl Data Eng 34(3):1149–1163
Liu L, Wang S, Hu B, Qiong Q, Wen J, Rosenblume DS (2018) Learning structures of interval-based Bayesian networks in probabilistic generative model for human complex activity recognition. Pattern Recogn 81:545–561
Lv Z, Zhao J, Liu Y, Wang W (2016) Use of a quantile regression based echo state network ensemble for construction of prediction intervals of gas flow in a blast furnace. Control Eng Pract 46:94–104
Nancy JY, Khanna NH (2017) A bio-statistical mining approach for classifying multivariate clinical time series data observed at irregular intervals. Expert Syst Appl 78:283–300
Nguyen TL, Gsponer S, Ilie I, O’Reilly M, Ifrim G (2019) Interpretable time series classification using linear models and multi-resolution multi-domain symbolic representations. Data Min Knowl Disc 33:1183–1222
Park H, Jae-Yoon J (2020) SAX-ARM: deviant event pattern discovery from multivariate time series using symbolic aggregate approximation and association rule mining. Expert Syst Appl 141:112950
Patel SP, Upadhyay SH (2020) Euclidean distance based feature ranking and subset selection for bearing fault diagnosis. Expert Syst Appl 154:113400
Pradhan GN, Prabhakaran B (2017) Association rule mining in multiple, multidimensional time series medical data. J Healthc Inf Res 1(1):92–118
Reiss A, Stricker D (2012) Introducing a new benchmarked dataset for activity monitoring. In: The 16th IEEE international symposium on wearable computers (ISWC), Newcastle, UK, 18–22 June 2012, pp 108–109
Sadri A, Ren Y, Salim FD (2017) Information gain-based metric for recognizing transitions in human activities. Pervasive Mob Comput 38:92–109
Sánchez P, Bellogín A (2020) Applying reranking strategies to route recommendation using sequence-aware evaluation. User Model User Adapt Interact 30(3):659–725
Serrà J, Serra I, Corral Á, LluisArcos J (2016) Ranking and significance of variable-length similarity-based time series motifs. Expert Syst Appl 55:452–460
Stisen A, Blunck H, Bhattacharya S, Prentow TS, Kjærgaard MB, Dey A, Sonne T, Jensen MM (2015) Smart devices are different: Assessing and mitigatingmobile sensing heterogeneities for activity recognition. In: Proceedings of the 13th ACM conference on embedded networked sensor systems, Seoul South Korea, 1–4 November 2015, pp 127–140
Thuy HTT, Anh DT, Chau VTN (2017) Comparing three time series segmentation methods via novel evaluation criteria. In: 2017 2nd International conferences on information technology, information systems and electrical engineering (ICITISEE), Yogyakarta, Indonesia, 1–2 November 2017, pp 171–176
Wang H, Zhang Q, Wu J, Pan S, Chen Y (2019) Time series feature learning with labeled and unlabeled data. Pattern Recogn 89:55–66
Yamaguchi A, Ueno K (2021) Learning time-series shapelets via supervised feature selection. In: Proceedings of the 2021 SIAM international conference on data mining (SDM). Society for Industrial and Applied Mathematics, Alexandria, VA, USA, pp 262–270
Yan B, Wang B, Zhou F, Li W, Xu B (2018) Sparse decomposition method based on time–frequency spectrum segmentation for fault signals in rotating machinery. ISA Trans 83:142–153
Yu J, Liu G (2020) Knowledge-based deep belief network for machining roughness prediction and knowledge discovery. Comput Ind 121:103262
Zhai Y, Lv Z, Zhao J, Wang W, Leung H (2022) Associative reasoning-based interpretable continuous decision making for long series data of industrial production process. Expert Syst Appl 204:117585
Zhao J, Itti L (2016) Classifying time series using local descriptors with hybrid sampling. IEEE Trans Knowl Data Eng 28(3):623–637
Zhao J, Wang W, Sun K, Liu Y (2014) A Bayesian networks structure learning and reasoning-based byproduct gas scheduling in steel industry. IEEE Trans Autom Sci Eng 11(4):1149–1154
Acknowledgements
The authors wish to thank the Associate Editor and the anonymous reviewers for their valuable comments and constructive suggestions, which helped improve the presentation of the paper. This work was supported by the National Key R&D Program of China under Grant 2017YFA0700300, the National Natural Sciences Foundation of China under Grant 61833003, Grant 61873048, Grant U1908218, the Fundamental Research Funds for the Central Universities under Grant DUT22JC16.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhai, Y., Lv, Z., Zhao, J. et al. Knowledge discovery and variable scale evaluation for long series data. Artif Intell Rev 56, 3157–3180 (2023). https://doi.org/10.1007/s10462-022-10250-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-022-10250-0