Abstract
Data quality intelligent detection feature extraction method was studied in the paper. The text segmentation model, word clustering, similarity calculation and other methods were applied to the treatment of data asset list, Data quality detection feature key word library and data asset feature list were generated, and then data quality detection was performed. The data knowledge in the data asset list was firstly used to extract the data characteristics and precipitate the business knowledge. Besides, the method adaptability was firstly studied base on different data type. Moreover, general data quality detection was carried out intended for a large number of discrete data in this work. The results showed that, the efficiency was improved by automatically data feature extraction based on data asset list other than manual works. And the shortage of incomplete statistics and insufficient accuracy of feature extraction was covered. In addition, the generality of data quality detection was furtherly improved and, the blind scanning range of data quality detection was reduced, leading to significant improvement of the efficiency and the accuracy of data quality intelligent detection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Hu, M., Bai, X., Xu, W., Wu, B.: Literature review of anomaly detection algorithms for multidimensional time series. J. Comput. Appl. 40(6), 1553 (2020)
Zhao, M., Zhao, Y., Zhu, Z.: Outlier detection based on label propagation. J. Data Acquis. Process. 34(3), 331–340 (2019)
Pang, X., Huang, Y., Wang, Z., Yu, Y., Gao, S.: Multivariate process variables abnormal data segments detection based on correlation coefficient. Control Eng. China (1) (2020)
Kuang, M., Li, Y., Li, C., Cao, M.: Research on abnormal electricity detection method based on multi-model by stacking ensemble learning. Electric Power Sci. Eng. 37(3), 23 (2021)
Ren, S., Zhang, J.: Overview of feature extraction algorithms for time series. J. Chin. Comput. Syst. (02) (2021)
Wen, Q., Gao, J., Song, X., et al.: RobustSTL: a robust seasonal-trend decomposition algorithm for long time series. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 5409–5416 (2019)
Marteau, P.F.: Time warp edit distance with stiffness adjustment for time series matching. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 306–318 (2009)
Gou, X., Xiao, X.: Short-term electricity price forecasting model based on empirical mode decomposition and LSTM neural network. J. Xi’an Univ. Technol. 36, 129–134 (2020)
Liu, Y., Wang, Q., Xu, Z.: Research on data cleaning and abnormal recognition method of dissolved gas in oil based on multi-layer architecture. J. North China Electric Power Univ. (Nat. Sci. Ed.)
Acknowledgments
This work is supported by the science and technology project of State Grid Corporation of China:“Research on data governance and knowledge mining technology of power IOT based on Artificial Intelligence” (Grand No. 5700-202058184A-0-0-00).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, W., Lei, S., Zheng, X., Liang, X. (2022). Research on Feature Extraction Method of Data Quality Intelligent Detection. In: Rage, U.K., Goyal, V., Reddy, P.K. (eds) Database Systems for Advanced Applications. DASFAA 2022 International Workshops. DASFAA 2022. Lecture Notes in Computer Science, vol 13248. Springer, Cham. https://doi.org/10.1007/978-3-031-11217-1_27
Download citation
DOI: https://doi.org/10.1007/978-3-031-11217-1_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11216-4
Online ISBN: 978-3-031-11217-1
eBook Packages: Computer ScienceComputer Science (R0)