Skip to main content

Outlier Detection Model Based on Autoencoder and Data Augmentation for High-Dimensional Sparse Data

  • Conference paper
  • First Online:
Data Science (ICPCSEE 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1879))

  • 481 Accesses

Abstract

This paper aims to address the problems of data imbalance, parameter adjustment complexity, and low accuracy in high-dimensional data anomaly detection. To address these issues, an autoencoder and data augmentation-based anomaly detection model for high-dimensional sparse data is proposed (SEAOD). First, the model solves the problem of imbalanced data by using the weighted SMOTE algorithm and ENN algorithm to fill in the minority class samples and generate a new dataset. Then, an attention mechanism is employed to calculate the feature similarity and determine the structure of the neural network so that the model can learn the data features. Finally, the data are dimensionally reduced based on the autoencoder, and the sparse high-dimensional data are mapped to a low-dimensional space for anomaly detection, overcoming the impact of the curse of dimensionality on detection algorithms. The experimental results show that on 15 public datasets, this model outperforms other comparison algorithms. Furthermore, it was validated on industrial air quality datasets and achieved the expected results with practicality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Porwal, U., Mukund, S.: Credit card fraud detection in e-commerce. In: 2019 18th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/13th IEEE International Conference on Big Data Science and Engineering (TrustCom/BigDataSE), pp. 280–287. IEEE (2019)

    Google Scholar 

  2. Zhang, L., Lin, J., Karim, R.: An angle-based subspace anomaly detection approach to high-dimensional data: With an application to industrial fault detection. Reliab. Eng. Syst. Saf. 142, 482–497 (2015)

    Article  Google Scholar 

  3. Alrawashdeh, K., Purdy, C.: Toward an online anomaly intrusion detection system based on deep learning. In: 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 195–200. IEEE (2016)

    Google Scholar 

  4. Gebremeskel, G.B., Yi, C., He, Z., et al.: Combined data mining techniques based patient data outlier detection for healthcare safety. Int. J. Intell. Comput. Cybern. (2016)

    Google Scholar 

  5. Liu, W., Pan, R.: Outlier mining based on variance of angle technology research in high-dimensional data. In: 2015 10th International Conference on Intelligent Systems and Knowledge Engineering (ISKE), pp. 598–603. IEEE (2015)

    Google Scholar 

  6. Yang, Z., Ge, Z.: Rethinking the value of just-in-time learning in the era of industrial big data. IEEE Trans. Industr. Inf. 18(2), 976–985 (2021)

    Article  Google Scholar 

  7. Breunig, M.M., Kriegel, H.P., NgR, T., et al.: LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 93–104 (2000)

    Google Scholar 

  8. Yang, X., Latecki, L.J., Pokrajac, D.: Outlier detection with globally optimal exemplar-based GMM. In: Proceedings of the 2009 SIAM International Conference on Data Mining, pp. 145–154. Society for Industrial and Applied Mathematics (2009)

    Google Scholar 

  9. Sarvari, H., Domeniconi, C., Prenkaj, B., Stilo, G.: Unsupervised boosting-based autoencoder ensembles for outlier detection. In: Karlapalem, K., et al. (eds.) PAKDD 2021. LNCS (LNAI), vol. 12712, pp. 91–103. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-75762-5_8

    Chapter  Google Scholar 

  10. Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J.: Hybrid sampling for imbalanced data. Integr. Comput.-Aided Eng. 16(3), 193–210 (2009)

    Article  Google Scholar 

  11. Cheng, L., Wang, Y., Liu, X., et al.: Outlier detection ensemble with embedded feature selection. In: Proceedings of the AAAI Conference on Artificial Intelligence 34(04), 3503–3512 (2020)

    Google Scholar 

  12. Song, H., Li, P., Liu, H.: Deep clustering based fair outlier detection. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 1481–1489 (2021)

    Google Scholar 

  13. Li, Y., Liu, N., Li, J., et al.: Deep structured cross-modal anomaly detection. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019)

    Google Scholar 

  14. Lu, W., Cheng, Y., Xiao, C., et al.: Unsupervised sequential outlier detection with deep architectures. IEEE Trans. Image Process. 26(9), 4321–4330 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  15. Dzaferagic, M., Marchetti, N., Macaluso, I.: Fault detection and classification in Industrial IoT in case of missing sensor data. IEEE Internet Things J. 9(11), 8892–8900 (2021)

    Article  Google Scholar 

  16. Liu, B., Xiao, Y., Cao, L., et al.: SVDD-based outlier detection on uncertain data. Knowl. Inf. Syst. 34, 597–618 (2013)

    Article  Google Scholar 

  17. Zhang, Z., Deng, X.: Anomaly detection using improved deep SVDD model with data structure preservation. Pattern Recogn. Lett. 148, 1–6 (2021)

    Article  Google Scholar 

  18. Zhou, X., Hu, Y., Liang, W., et al.: Variational LSTM enhanced anomaly detection for industrial big data. IEEE Trans. Industr. Inf. 17(5), 3469–3477 (2020)

    Article  Google Scholar 

  19. Campos, G.O., Zimek, A., Sander, J., et al.: Data Min. Knowl. Discov. 30, 891–927 (2016)

    Google Scholar 

  20. Anaissi, A., Braytee, A., Naji, M.: Gaussian kernel parameter optimization in one-class support vector machines. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2018)

    Google Scholar 

  21. Xu, Z., Kakde, D., Chaudhuri, A.: Automatic hyperparameter tuning method for local outlier factor, with applications to anomaly detection. In: 2019 IEEE International Conference on Big Data (Big Data), pp. 4201–4207. IEEE (2019)

    Google Scholar 

  22. Li, Z., Zhao, Y., Botta, N., et al.: COPOD: copula-based outlier detection. In: 2020 IEEE International Conference on Data Mining (ICDM), pp. 1118–1123. IEEE (2020)

    Google Scholar 

  23. Chen, Y., Ashizawa, N., Yean, S., et al.: Self-organizing map assisted deep autoencoding Gaussian mixture model for intrusion detection. In: 2021 IEEE 18th Annual Consumer Communications and Networking Conference (CCNC), pp. 1–6. IEEE (2021)

    Google Scholar 

Download references

Acknowledgement

This work is supported by the National Key R&D Program of China under Grant No. 2020YFB1710200.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Qilong Han or Zhiqiang Ma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, H., Ma, W., Han, Q., Ma, Z. (2023). Outlier Detection Model Based on Autoencoder and Data Augmentation for High-Dimensional Sparse Data. In: Yu, Z., et al. Data Science. ICPCSEE 2023. Communications in Computer and Information Science, vol 1879. Springer, Singapore. https://doi.org/10.1007/978-981-99-5968-6_14

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-5968-6_14

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-5967-9

  • Online ISBN: 978-981-99-5968-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics