Skip to main content

Data Quality Identification Model for Power Big Data

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1629))

Abstract

Data quality identification is an important task in power big data. Abnormal data exist and hamper the effective utilization of power big data. Moreover, the lack of labeled data makes the detection of abnormal data more challenging. Then, a data quality identification model for power big data is proposed. It can detect abnormal data from massive power big data. In this model, power data are grouped and then mapped into different feature spaces based on data augmentation technology. Tri-training is applied to detect abnormal data from different power data from different feature spaces. Experiments and simulations are performed to demonstrate the effectiveness of the proposed model.

Supported by the Science and Technology Project of State Grid Shandong Electric Power Company: “Research on the Key Technology of Heterogeneous Graph Anomaly Pattern Recognition Governance Based on Attention Mechanism” (Grant No. 2020A-135).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Azeroual, O., Jha, M.: Without data quality, there is no data migration. Big Data Cogn. Comput. 5(2), 24 (2021)

    Article  Google Scholar 

  2. Batini, C., Rula, A.: From data quality to big data quality: a data integration scenario. In: Greco, S., Lenzerini, M., Masciari, E., Tagarelli, A. (eds.) Proceedings of the 29th Italian Symposium on Advanced Database Systems, SEBD 2021, Pizzo Calabro (VV), Italy, 5–9 September 2021. CEUR Workshop Proceedings, vol. 2994, pp. 36–47. CEUR-WS.org (2021)

    Google Scholar 

  3. Bayer, M., Kaufhold, M., Reuter, C.: A survey on data augmentation for text classification. CoRR abs/2107.03158 (2021)

    Google Scholar 

  4. Biswal, B.N., Behera, H.S., Bisoi, R., Dash, P.K.: Classification of power quality data using decision tree and chemotactic differential evolution based fuzzy clustering. Swarm Evol. Comput. 4, 12–24 (2012)

    Article  Google Scholar 

  5. Blum, A., Mitchell, T.M.: Combining labeled and unlabeled data with co-training. In: Bartlett, P.L., Mansour, Y. (eds.) Proceedings of the Eleventh Annual Conference on Computational Learning Theory, COLT 1998, Madison, Wisconsin, USA, 24–26 July 1998, pp. 92–100. ACM (1998)

    Google Scholar 

  6. Chemnitz, N.Ø., Bonnet, P., Büttrich, S., Shklovski, I., Watts, L.: Unionized data governance in virtual power plants: poster. In: de Meer, H., Meo, M. (eds.) e-Energy 2021: The Twelfth ACM International Conference on Future Energy Systems, Virtual Event, Torino, Italy, 28 June - 2 July 2021, pp. 282–283. ACM (2021)

    Google Scholar 

  7. Ding, K., Xu, Z., Tong, H., Liu, H.: Data augmentation for deep graph learning: a survey. CoRR abs/2202.08235 (2022)

    Google Scholar 

  8. Expósito, R.R., Galego-Torreiro, R., González-Domínguez, J.: SeQual: big data tool to perform quality control and data preprocessing of large NGS datasets. IEEE Access 8, 146075–146084 (2020)

    Article  Google Scholar 

  9. Feng, S.Y., Gangal, V., et al.: A survey of data augmentation approaches for NLP. In: Zong, C., Xia, F., Li, W., Navigli, R. (eds.) Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, 1–6 August 2021. Findings of ACL, vol. ACL/IJCNLP 2021, pp. 968–988. Association for Computational Linguistics (2021)

    Google Scholar 

  10. Fernando, T., Gammulle, H., Denman, S., Sridharan, S., Fookes, C.: Deep learning for medical anomaly detection - a survey. ACM Comput. Surv. 54(7), 141:1–141:37 (2022)

    Google Scholar 

  11. Hallac, D., Vare, S., Boyd, S.P., Leskovec, J.: Toeplitz inverse covariance-based clustering of multivariate time series data. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017, pp. 215–223. ACM (2017)

    Google Scholar 

  12. Lee, G., Lee, S.J., Lee, C.: A convolutional neural network model for abnormality diagnosis in a nuclear power plant. Appl. Soft Comput. 99, 106874 (2021)

    Article  Google Scholar 

  13. Li, J., Wang, G., Chen, M., Ding, Z., Yang, H.: Mixup asymmetric tri-training for heartbeat classification under domain shift. IEEE Signal Process. Lett. 28, 718–722 (2021)

    Article  Google Scholar 

  14. Mohammadi, B., Fathy, M., Sabokrou, M.: Image/video deep anomaly detection: a survey. CoRR abs/2103.01739 (2021)

    Google Scholar 

  15. Montero, O., Crespo, Y., Piatini, M.: Big data quality models: a systematic mapping study. In: Paiva, A.C.R., Cavalli, A.R., Ventura Martins, P., Pérez-Castillo, R. (eds.) QUATIC 2021. CCIS, vol. 1439, pp. 416–430. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85347-1_30

    Chapter  Google Scholar 

  16. Nesen, A., Bhargava, B.K.: Knowledge graphs for semantic-aware anomaly detection in video. In: 3rd IEEE International Conference on Artificial Intelligence and Knowledge Engineering, AIKE 2020, Laguna Hills, CA, USA, 9–13 December 2020, pp. 65–70. IEEE (2020)

    Google Scholar 

  17. Qiao, L., Zhou, Q., Song, C., Wu, H., Liu, B., Yu, S.: Design of overall framework of self-service big data governance for power grid. In: Zhai, X.B., Chen, B., Zhu, K. (eds.) MLICOM 2019. LNICST, vol. 294, pp. 222–234. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32388-2_19

    Chapter  Google Scholar 

  18. Saito, K., Ushiku, Y., Harada, T.: Asymmetric tri-training for unsupervised domain adaptation. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6–11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 2988–2997. PMLR (2017)

    Google Scholar 

  19. Taleb, I., Serhani, M.A., Bouhaddioui, C., Dssouli, R.: Big data quality framework: a holistic approach to continuous quality management. J. Big Data 8(1), 1–41 (2021). https://doi.org/10.1186/s40537-021-00468-0

    Article  Google Scholar 

  20. Talha, M., Kalam, A.A.E.: Big data between quality and security: dynamic access control for collaborative platforms. J. Univers. Comput. Sci. 27(12), 1300–1324 (2021)

    Article  Google Scholar 

  21. Talha, M., Elmarzouqi, N., Kalam, A.A.E.: Quality and security in big data: challenges as opportunities to build a powerful wrap-up solution. J. Ubiquit. Syst. Perv. Netw. 12(1), 9–15 (2020)

    Article  Google Scholar 

  22. Wen, Q., et al.: Time series data augmentation for deep learning: a survey. In: Zhou, Z. (ed.) Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, Virtual Event / Montreal, Canada, 19–27 August 2021, pp. 4653–4660. ijcai.org (2021)

    Google Scholar 

  23. Yu, J., Yin, H., Gao, M., Xia, X., Zhang, X., Hung, N.Q.V.: Socially-aware self-supervised tri-training for recommendation. In: Zhu, F., Ooi, B.C., Miao, C. (eds.) KDD 2021: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, Singapore, 14–18 August 2021, pp. 2084–2092. ACM (2021)

    Google Scholar 

  24. Yuan, S., Wu, X.: Trustworthy anomaly detection: a survey. CoRR abs/2202.07787 (2022)

    Google Scholar 

  25. Zakaria, J., Mueen, A., Keogh, E.J.: Clustering time series using unsupervised-shapelets. In: Zaki, M.J., Siebes, A., Yu, J.X., Goethals, B., Webb, G.I., Wu, X. (eds.) 12th IEEE International Conference on Data Mining, ICDM 2012, Brussels, Belgium, 10–13 December 2012, pp. 785–794. IEEE Computer Society (2012)

    Google Scholar 

  26. Zhang, J.E., Wu, D., Boulet, B.: Time series anomaly detection for smart grids: a survey. CoRR abs/2107.08835 (2021)

    Google Scholar 

  27. Zhao, B., Shi, Y., Zhang, K., Yan, Z.: Health insurance anomaly detection based on dynamic heterogeneous information network. In: Yoo, I., Bi, J., Hu, X. (eds.) 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019, San Diego, CA, USA, 18–21 November 2019, pp. 1118–1122. IEEE (2019)

    Google Scholar 

  28. Zhao, T., Liu, G., Günnemann, S., Jiang, M.: Graph data augmentation for graph machine learning: a survey. CoRR abs/2202.08871 (2022)

    Google Scholar 

  29. Zhou, Z., Li, M.: Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 17(11), 1529–1541 (2005)

    Article  Google Scholar 

  30. Zhu, H., Liu, J., Wan, M.: Label noise detection based on tri-training. In: Sun, X., Pan, Z., Bertino, E. (eds.) ICCCS 2018. LNCS, vol. 11063, pp. 613–622. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00006-6_56

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haijie Zheng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zheng, H., Tian, B., Liu, X., Zhang, W., Liu, S., Wang, C. (2022). Data Quality Identification Model for Power Big Data. In: Wang, Y., Zhu, G., Han, Q., Zhang, L., Song, X., Lu, Z. (eds) Data Science. ICPCSEE 2022. Communications in Computer and Information Science, vol 1629. Springer, Singapore. https://doi.org/10.1007/978-981-19-5209-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-5209-8_2

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-5208-1

  • Online ISBN: 978-981-19-5209-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics