Abstract
Data quality identification is an important task in power big data. Abnormal data exist and hamper the effective utilization of power big data. Moreover, the lack of labeled data makes the detection of abnormal data more challenging. Then, a data quality identification model for power big data is proposed. It can detect abnormal data from massive power big data. In this model, power data are grouped and then mapped into different feature spaces based on data augmentation technology. Tri-training is applied to detect abnormal data from different power data from different feature spaces. Experiments and simulations are performed to demonstrate the effectiveness of the proposed model.
Supported by the Science and Technology Project of State Grid Shandong Electric Power Company: “Research on the Key Technology of Heterogeneous Graph Anomaly Pattern Recognition Governance Based on Attention Mechanism” (Grant No. 2020A-135).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Azeroual, O., Jha, M.: Without data quality, there is no data migration. Big Data Cogn. Comput. 5(2), 24 (2021)
Batini, C., Rula, A.: From data quality to big data quality: a data integration scenario. In: Greco, S., Lenzerini, M., Masciari, E., Tagarelli, A. (eds.) Proceedings of the 29th Italian Symposium on Advanced Database Systems, SEBD 2021, Pizzo Calabro (VV), Italy, 5–9 September 2021. CEUR Workshop Proceedings, vol. 2994, pp. 36–47. CEUR-WS.org (2021)
Bayer, M., Kaufhold, M., Reuter, C.: A survey on data augmentation for text classification. CoRR abs/2107.03158 (2021)
Biswal, B.N., Behera, H.S., Bisoi, R., Dash, P.K.: Classification of power quality data using decision tree and chemotactic differential evolution based fuzzy clustering. Swarm Evol. Comput. 4, 12–24 (2012)
Blum, A., Mitchell, T.M.: Combining labeled and unlabeled data with co-training. In: Bartlett, P.L., Mansour, Y. (eds.) Proceedings of the Eleventh Annual Conference on Computational Learning Theory, COLT 1998, Madison, Wisconsin, USA, 24–26 July 1998, pp. 92–100. ACM (1998)
Chemnitz, N.Ø., Bonnet, P., Büttrich, S., Shklovski, I., Watts, L.: Unionized data governance in virtual power plants: poster. In: de Meer, H., Meo, M. (eds.) e-Energy 2021: The Twelfth ACM International Conference on Future Energy Systems, Virtual Event, Torino, Italy, 28 June - 2 July 2021, pp. 282–283. ACM (2021)
Ding, K., Xu, Z., Tong, H., Liu, H.: Data augmentation for deep graph learning: a survey. CoRR abs/2202.08235 (2022)
Expósito, R.R., Galego-Torreiro, R., González-Domínguez, J.: SeQual: big data tool to perform quality control and data preprocessing of large NGS datasets. IEEE Access 8, 146075–146084 (2020)
Feng, S.Y., Gangal, V., et al.: A survey of data augmentation approaches for NLP. In: Zong, C., Xia, F., Li, W., Navigli, R. (eds.) Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, 1–6 August 2021. Findings of ACL, vol. ACL/IJCNLP 2021, pp. 968–988. Association for Computational Linguistics (2021)
Fernando, T., Gammulle, H., Denman, S., Sridharan, S., Fookes, C.: Deep learning for medical anomaly detection - a survey. ACM Comput. Surv. 54(7), 141:1–141:37 (2022)
Hallac, D., Vare, S., Boyd, S.P., Leskovec, J.: Toeplitz inverse covariance-based clustering of multivariate time series data. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017, pp. 215–223. ACM (2017)
Lee, G., Lee, S.J., Lee, C.: A convolutional neural network model for abnormality diagnosis in a nuclear power plant. Appl. Soft Comput. 99, 106874 (2021)
Li, J., Wang, G., Chen, M., Ding, Z., Yang, H.: Mixup asymmetric tri-training for heartbeat classification under domain shift. IEEE Signal Process. Lett. 28, 718–722 (2021)
Mohammadi, B., Fathy, M., Sabokrou, M.: Image/video deep anomaly detection: a survey. CoRR abs/2103.01739 (2021)
Montero, O., Crespo, Y., Piatini, M.: Big data quality models: a systematic mapping study. In: Paiva, A.C.R., Cavalli, A.R., Ventura Martins, P., Pérez-Castillo, R. (eds.) QUATIC 2021. CCIS, vol. 1439, pp. 416–430. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85347-1_30
Nesen, A., Bhargava, B.K.: Knowledge graphs for semantic-aware anomaly detection in video. In: 3rd IEEE International Conference on Artificial Intelligence and Knowledge Engineering, AIKE 2020, Laguna Hills, CA, USA, 9–13 December 2020, pp. 65–70. IEEE (2020)
Qiao, L., Zhou, Q., Song, C., Wu, H., Liu, B., Yu, S.: Design of overall framework of self-service big data governance for power grid. In: Zhai, X.B., Chen, B., Zhu, K. (eds.) MLICOM 2019. LNICST, vol. 294, pp. 222–234. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32388-2_19
Saito, K., Ushiku, Y., Harada, T.: Asymmetric tri-training for unsupervised domain adaptation. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6–11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 2988–2997. PMLR (2017)
Taleb, I., Serhani, M.A., Bouhaddioui, C., Dssouli, R.: Big data quality framework: a holistic approach to continuous quality management. J. Big Data 8(1), 1–41 (2021). https://doi.org/10.1186/s40537-021-00468-0
Talha, M., Kalam, A.A.E.: Big data between quality and security: dynamic access control for collaborative platforms. J. Univers. Comput. Sci. 27(12), 1300–1324 (2021)
Talha, M., Elmarzouqi, N., Kalam, A.A.E.: Quality and security in big data: challenges as opportunities to build a powerful wrap-up solution. J. Ubiquit. Syst. Perv. Netw. 12(1), 9–15 (2020)
Wen, Q., et al.: Time series data augmentation for deep learning: a survey. In: Zhou, Z. (ed.) Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, Virtual Event / Montreal, Canada, 19–27 August 2021, pp. 4653–4660. ijcai.org (2021)
Yu, J., Yin, H., Gao, M., Xia, X., Zhang, X., Hung, N.Q.V.: Socially-aware self-supervised tri-training for recommendation. In: Zhu, F., Ooi, B.C., Miao, C. (eds.) KDD 2021: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, Singapore, 14–18 August 2021, pp. 2084–2092. ACM (2021)
Yuan, S., Wu, X.: Trustworthy anomaly detection: a survey. CoRR abs/2202.07787 (2022)
Zakaria, J., Mueen, A., Keogh, E.J.: Clustering time series using unsupervised-shapelets. In: Zaki, M.J., Siebes, A., Yu, J.X., Goethals, B., Webb, G.I., Wu, X. (eds.) 12th IEEE International Conference on Data Mining, ICDM 2012, Brussels, Belgium, 10–13 December 2012, pp. 785–794. IEEE Computer Society (2012)
Zhang, J.E., Wu, D., Boulet, B.: Time series anomaly detection for smart grids: a survey. CoRR abs/2107.08835 (2021)
Zhao, B., Shi, Y., Zhang, K., Yan, Z.: Health insurance anomaly detection based on dynamic heterogeneous information network. In: Yoo, I., Bi, J., Hu, X. (eds.) 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019, San Diego, CA, USA, 18–21 November 2019, pp. 1118–1122. IEEE (2019)
Zhao, T., Liu, G., Günnemann, S., Jiang, M.: Graph data augmentation for graph machine learning: a survey. CoRR abs/2202.08871 (2022)
Zhou, Z., Li, M.: Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 17(11), 1529–1541 (2005)
Zhu, H., Liu, J., Wan, M.: Label noise detection based on tri-training. In: Sun, X., Pan, Z., Bertino, E. (eds.) ICCCS 2018. LNCS, vol. 11063, pp. 613–622. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00006-6_56
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zheng, H., Tian, B., Liu, X., Zhang, W., Liu, S., Wang, C. (2022). Data Quality Identification Model for Power Big Data. In: Wang, Y., Zhu, G., Han, Q., Zhang, L., Song, X., Lu, Z. (eds) Data Science. ICPCSEE 2022. Communications in Computer and Information Science, vol 1629. Springer, Singapore. https://doi.org/10.1007/978-981-19-5209-8_2
Download citation
DOI: https://doi.org/10.1007/978-981-19-5209-8_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-5208-1
Online ISBN: 978-981-19-5209-8
eBook Packages: Computer ScienceComputer Science (R0)