Data Quality Identification Model for Power Big Data

Zheng, Haijie; Tian, Bing; Liu, Xiaobao; Zhang, Wenbin; Liu, Shenqi; Wang, Cong

doi:10.1007/978-981-19-5209-8_2

Haijie Zheng¹¹,
Bing Tian¹¹,
Xiaobao Liu¹¹,
Wenbin Zhang¹¹,
Shenqi Liu¹¹ &
…
Cong Wang¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1629))

Included in the following conference series:

International Conference of Pioneering Computer Scientists, Engineers and Educators

890 Accesses
2 Citations

Abstract

Data quality identification is an important task in power big data. Abnormal data exist and hamper the effective utilization of power big data. Moreover, the lack of labeled data makes the detection of abnormal data more challenging. Then, a data quality identification model for power big data is proposed. It can detect abnormal data from massive power big data. In this model, power data are grouped and then mapped into different feature spaces based on data augmentation technology. Tri-training is applied to detect abnormal data from different power data from different feature spaces. Experiments and simulations are performed to demonstrate the effectiveness of the proposed model.

Supported by the Science and Technology Project of State Grid Shandong Electric Power Company: “Research on the Key Technology of Heterogeneous Graph Anomaly Pattern Recognition Governance Based on Attention Mechanism” (Grant No. 2020A-135).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Study on the Application of Big Data Analysis on the Electric Power Meter Inspection

An Integrated Data Preprocessing Framework Based on Apache Spark for Fault Diagnosis of Power Grid Equipment

Article 02 March 2016

Data Quality Management Framework for Smart Grid Systems

References

Azeroual, O., Jha, M.: Without data quality, there is no data migration. Big Data Cogn. Comput. 5(2), 24 (2021)
Article Google Scholar
Batini, C., Rula, A.: From data quality to big data quality: a data integration scenario. In: Greco, S., Lenzerini, M., Masciari, E., Tagarelli, A. (eds.) Proceedings of the 29th Italian Symposium on Advanced Database Systems, SEBD 2021, Pizzo Calabro (VV), Italy, 5–9 September 2021. CEUR Workshop Proceedings, vol. 2994, pp. 36–47. CEUR-WS.org (2021)
Google Scholar
Bayer, M., Kaufhold, M., Reuter, C.: A survey on data augmentation for text classification. CoRR abs/2107.03158 (2021)
Google Scholar
Biswal, B.N., Behera, H.S., Bisoi, R., Dash, P.K.: Classification of power quality data using decision tree and chemotactic differential evolution based fuzzy clustering. Swarm Evol. Comput. 4, 12–24 (2012)
Article Google Scholar
Blum, A., Mitchell, T.M.: Combining labeled and unlabeled data with co-training. In: Bartlett, P.L., Mansour, Y. (eds.) Proceedings of the Eleventh Annual Conference on Computational Learning Theory, COLT 1998, Madison, Wisconsin, USA, 24–26 July 1998, pp. 92–100. ACM (1998)
Google Scholar
Chemnitz, N.Ø., Bonnet, P., Büttrich, S., Shklovski, I., Watts, L.: Unionized data governance in virtual power plants: poster. In: de Meer, H., Meo, M. (eds.) e-Energy 2021: The Twelfth ACM International Conference on Future Energy Systems, Virtual Event, Torino, Italy, 28 June - 2 July 2021, pp. 282–283. ACM (2021)
Google Scholar
Ding, K., Xu, Z., Tong, H., Liu, H.: Data augmentation for deep graph learning: a survey. CoRR abs/2202.08235 (2022)
Google Scholar
Expósito, R.R., Galego-Torreiro, R., González-Domínguez, J.: SeQual: big data tool to perform quality control and data preprocessing of large NGS datasets. IEEE Access 8, 146075–146084 (2020)
Article Google Scholar
Feng, S.Y., Gangal, V., et al.: A survey of data augmentation approaches for NLP. In: Zong, C., Xia, F., Li, W., Navigli, R. (eds.) Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, 1–6 August 2021. Findings of ACL, vol. ACL/IJCNLP 2021, pp. 968–988. Association for Computational Linguistics (2021)
Google Scholar
Fernando, T., Gammulle, H., Denman, S., Sridharan, S., Fookes, C.: Deep learning for medical anomaly detection - a survey. ACM Comput. Surv. 54(7), 141:1–141:37 (2022)
Google Scholar
Hallac, D., Vare, S., Boyd, S.P., Leskovec, J.: Toeplitz inverse covariance-based clustering of multivariate time series data. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017, pp. 215–223. ACM (2017)
Google Scholar
Lee, G., Lee, S.J., Lee, C.: A convolutional neural network model for abnormality diagnosis in a nuclear power plant. Appl. Soft Comput. 99, 106874 (2021)
Article Google Scholar
Li, J., Wang, G., Chen, M., Ding, Z., Yang, H.: Mixup asymmetric tri-training for heartbeat classification under domain shift. IEEE Signal Process. Lett. 28, 718–722 (2021)
Article Google Scholar
Mohammadi, B., Fathy, M., Sabokrou, M.: Image/video deep anomaly detection: a survey. CoRR abs/2103.01739 (2021)
Google Scholar
Montero, O., Crespo, Y., Piatini, M.: Big data quality models: a systematic mapping study. In: Paiva, A.C.R., Cavalli, A.R., Ventura Martins, P., Pérez-Castillo, R. (eds.) QUATIC 2021. CCIS, vol. 1439, pp. 416–430. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85347-1_30
Chapter Google Scholar
Nesen, A., Bhargava, B.K.: Knowledge graphs for semantic-aware anomaly detection in video. In: 3rd IEEE International Conference on Artificial Intelligence and Knowledge Engineering, AIKE 2020, Laguna Hills, CA, USA, 9–13 December 2020, pp. 65–70. IEEE (2020)
Google Scholar
Qiao, L., Zhou, Q., Song, C., Wu, H., Liu, B., Yu, S.: Design of overall framework of self-service big data governance for power grid. In: Zhai, X.B., Chen, B., Zhu, K. (eds.) MLICOM 2019. LNICST, vol. 294, pp. 222–234. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32388-2_19
Chapter Google Scholar
Saito, K., Ushiku, Y., Harada, T.: Asymmetric tri-training for unsupervised domain adaptation. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6–11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 2988–2997. PMLR (2017)
Google Scholar
Taleb, I., Serhani, M.A., Bouhaddioui, C., Dssouli, R.: Big data quality framework: a holistic approach to continuous quality management. J. Big Data 8(1), 1–41 (2021). https://doi.org/10.1186/s40537-021-00468-0
Article Google Scholar
Talha, M., Kalam, A.A.E.: Big data between quality and security: dynamic access control for collaborative platforms. J. Univers. Comput. Sci. 27(12), 1300–1324 (2021)
Article Google Scholar
Talha, M., Elmarzouqi, N., Kalam, A.A.E.: Quality and security in big data: challenges as opportunities to build a powerful wrap-up solution. J. Ubiquit. Syst. Perv. Netw. 12(1), 9–15 (2020)
Article Google Scholar
Wen, Q., et al.: Time series data augmentation for deep learning: a survey. In: Zhou, Z. (ed.) Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, Virtual Event / Montreal, Canada, 19–27 August 2021, pp. 4653–4660. ijcai.org (2021)
Google Scholar
Yu, J., Yin, H., Gao, M., Xia, X., Zhang, X., Hung, N.Q.V.: Socially-aware self-supervised tri-training for recommendation. In: Zhu, F., Ooi, B.C., Miao, C. (eds.) KDD 2021: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, Singapore, 14–18 August 2021, pp. 2084–2092. ACM (2021)
Google Scholar
Yuan, S., Wu, X.: Trustworthy anomaly detection: a survey. CoRR abs/2202.07787 (2022)
Google Scholar
Zakaria, J., Mueen, A., Keogh, E.J.: Clustering time series using unsupervised-shapelets. In: Zaki, M.J., Siebes, A., Yu, J.X., Goethals, B., Webb, G.I., Wu, X. (eds.) 12th IEEE International Conference on Data Mining, ICDM 2012, Brussels, Belgium, 10–13 December 2012, pp. 785–794. IEEE Computer Society (2012)
Google Scholar
Zhang, J.E., Wu, D., Boulet, B.: Time series anomaly detection for smart grids: a survey. CoRR abs/2107.08835 (2021)
Google Scholar
Zhao, B., Shi, Y., Zhang, K., Yan, Z.: Health insurance anomaly detection based on dynamic heterogeneous information network. In: Yoo, I., Bi, J., Hu, X. (eds.) 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019, San Diego, CA, USA, 18–21 November 2019, pp. 1118–1122. IEEE (2019)
Google Scholar
Zhao, T., Liu, G., Günnemann, S., Jiang, M.: Graph data augmentation for graph machine learning: a survey. CoRR abs/2202.08871 (2022)
Google Scholar
Zhou, Z., Li, M.: Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 17(11), 1529–1541 (2005)
Article Google Scholar
Zhu, H., Liu, J., Wan, M.: Label noise detection based on tri-training. In: Sun, X., Pan, Z., Bertino, E. (eds.) ICCCS 2018. LNCS, vol. 11063, pp. 613–622. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00006-6_56
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

State Grid Shandong Electric Power Company Information and Communication Company, Jinan, 250000, Shandong, China
Haijie Zheng, Bing Tian, Xiaobao Liu, Wenbin Zhang, Shenqi Liu & Cong Wang

Authors

Haijie Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Bing Tian
View author publications
You can also search for this author in PubMed Google Scholar
Xiaobao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wenbin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shenqi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Cong Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haijie Zheng .

Editor information

Editors and Affiliations

Southwest Petroleum University, Chengdu, China
Yang Wang
University of Electronic Science and Technology of China, Chengdu, China
Guobin Zhu
Harbin Engineering University, Harbin, China
Qilong Han
Southwest Petroleum University, Chengdu, China
Liehui Zhang
Harbin University of Science and Technology, Harbin, China
Xianhua Song
National Academy of Guo Ding Institute of Data Sciences, Beijing, China
Zeguang Lu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zheng, H., Tian, B., Liu, X., Zhang, W., Liu, S., Wang, C. (2022). Data Quality Identification Model for Power Big Data. In: Wang, Y., Zhu, G., Han, Q., Zhang, L., Song, X., Lu, Z. (eds) Data Science. ICPCSEE 2022. Communications in Computer and Information Science, vol 1629. Springer, Singapore. https://doi.org/10.1007/978-981-19-5209-8_2

Download citation

DOI: https://doi.org/10.1007/978-981-19-5209-8_2
Published: 10 August 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-5208-1
Online ISBN: 978-981-19-5209-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Data Quality Identification Model for Power Big Data