Abstract
This paper aims to address the problems of data imbalance, parameter adjustment complexity, and low accuracy in high-dimensional data anomaly detection. To address these issues, an autoencoder and data augmentation-based anomaly detection model for high-dimensional sparse data is proposed (SEAOD). First, the model solves the problem of imbalanced data by using the weighted SMOTE algorithm and ENN algorithm to fill in the minority class samples and generate a new dataset. Then, an attention mechanism is employed to calculate the feature similarity and determine the structure of the neural network so that the model can learn the data features. Finally, the data are dimensionally reduced based on the autoencoder, and the sparse high-dimensional data are mapped to a low-dimensional space for anomaly detection, overcoming the impact of the curse of dimensionality on detection algorithms. The experimental results show that on 15 public datasets, this model outperforms other comparison algorithms. Furthermore, it was validated on industrial air quality datasets and achieved the expected results with practicality.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Porwal, U., Mukund, S.: Credit card fraud detection in e-commerce. In: 2019 18th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/13th IEEE International Conference on Big Data Science and Engineering (TrustCom/BigDataSE), pp. 280–287. IEEE (2019)
Zhang, L., Lin, J., Karim, R.: An angle-based subspace anomaly detection approach to high-dimensional data: With an application to industrial fault detection. Reliab. Eng. Syst. Saf. 142, 482–497 (2015)
Alrawashdeh, K., Purdy, C.: Toward an online anomaly intrusion detection system based on deep learning. In: 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 195–200. IEEE (2016)
Gebremeskel, G.B., Yi, C., He, Z., et al.: Combined data mining techniques based patient data outlier detection for healthcare safety. Int. J. Intell. Comput. Cybern. (2016)
Liu, W., Pan, R.: Outlier mining based on variance of angle technology research in high-dimensional data. In: 2015 10th International Conference on Intelligent Systems and Knowledge Engineering (ISKE), pp. 598–603. IEEE (2015)
Yang, Z., Ge, Z.: Rethinking the value of just-in-time learning in the era of industrial big data. IEEE Trans. Industr. Inf. 18(2), 976–985 (2021)
Breunig, M.M., Kriegel, H.P., NgR, T., et al.: LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 93–104 (2000)
Yang, X., Latecki, L.J., Pokrajac, D.: Outlier detection with globally optimal exemplar-based GMM. In: Proceedings of the 2009 SIAM International Conference on Data Mining, pp. 145–154. Society for Industrial and Applied Mathematics (2009)
Sarvari, H., Domeniconi, C., Prenkaj, B., Stilo, G.: Unsupervised boosting-based autoencoder ensembles for outlier detection. In: Karlapalem, K., et al. (eds.) PAKDD 2021. LNCS (LNAI), vol. 12712, pp. 91–103. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-75762-5_8
Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J.: Hybrid sampling for imbalanced data. Integr. Comput.-Aided Eng. 16(3), 193–210 (2009)
Cheng, L., Wang, Y., Liu, X., et al.: Outlier detection ensemble with embedded feature selection. In: Proceedings of the AAAI Conference on Artificial Intelligence 34(04), 3503–3512 (2020)
Song, H., Li, P., Liu, H.: Deep clustering based fair outlier detection. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 1481–1489 (2021)
Li, Y., Liu, N., Li, J., et al.: Deep structured cross-modal anomaly detection. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019)
Lu, W., Cheng, Y., Xiao, C., et al.: Unsupervised sequential outlier detection with deep architectures. IEEE Trans. Image Process. 26(9), 4321–4330 (2017)
Dzaferagic, M., Marchetti, N., Macaluso, I.: Fault detection and classification in Industrial IoT in case of missing sensor data. IEEE Internet Things J. 9(11), 8892–8900 (2021)
Liu, B., Xiao, Y., Cao, L., et al.: SVDD-based outlier detection on uncertain data. Knowl. Inf. Syst. 34, 597–618 (2013)
Zhang, Z., Deng, X.: Anomaly detection using improved deep SVDD model with data structure preservation. Pattern Recogn. Lett. 148, 1–6 (2021)
Zhou, X., Hu, Y., Liang, W., et al.: Variational LSTM enhanced anomaly detection for industrial big data. IEEE Trans. Industr. Inf. 17(5), 3469–3477 (2020)
Campos, G.O., Zimek, A., Sander, J., et al.: Data Min. Knowl. Discov. 30, 891–927 (2016)
Anaissi, A., Braytee, A., Naji, M.: Gaussian kernel parameter optimization in one-class support vector machines. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2018)
Xu, Z., Kakde, D., Chaudhuri, A.: Automatic hyperparameter tuning method for local outlier factor, with applications to anomaly detection. In: 2019 IEEE International Conference on Big Data (Big Data), pp. 4201–4207. IEEE (2019)
Li, Z., Zhao, Y., Botta, N., et al.: COPOD: copula-based outlier detection. In: 2020 IEEE International Conference on Data Mining (ICDM), pp. 1118–1123. IEEE (2020)
Chen, Y., Ashizawa, N., Yean, S., et al.: Self-organizing map assisted deep autoencoding Gaussian mixture model for intrusion detection. In: 2021 IEEE 18th Annual Consumer Communications and Networking Conference (CCNC), pp. 1–6. IEEE (2021)
Acknowledgement
This work is supported by the National Key R&D Program of China under Grant No. 2020YFB1710200.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhang, H., Ma, W., Han, Q., Ma, Z. (2023). Outlier Detection Model Based on Autoencoder and Data Augmentation for High-Dimensional Sparse Data. In: Yu, Z., et al. Data Science. ICPCSEE 2023. Communications in Computer and Information Science, vol 1879. Springer, Singapore. https://doi.org/10.1007/978-981-99-5968-6_14
Download citation
DOI: https://doi.org/10.1007/978-981-99-5968-6_14
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-5967-9
Online ISBN: 978-981-99-5968-6
eBook Packages: Computer ScienceComputer Science (R0)