Abstract
Many real-world problems deal with collections of high-dimensional data, i.e., data with many different features. A dataset exhibiting a high number of features incurs the so-called curse of dimensionality: when the dimensionality increases, the volume of the space increases at a fast rate, causing the sparseness of the data. This makes challenging clustering high-dimensional data for outlier detection purposes. In this paper, we design and implement a distributed peer to peer version of an algorithm that addresses the curse of dimensionality by generating candidate subspaces from the high-dimensional space through Principal Component Analysis. The experimental results show that if the parameters of the distributed algorithm are properly set, then the distributed algorithm converges to the results provided by the sequential algorithm, which is a fundamental and highly desirable property.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD Rec. 27(2), 94–105 (1998). https://doi.org/10.1145/276305.276314
Di Fatta, G., Blasa, F., Cafiero, S., Fortino, G.: Epidemic k-means clustering. In: 2011 IEEE 11th International Conference on Data Mining Workshops, pp. 151–158 (2011)
Elhamifar, E., Vidal, R.: Sparse subspace clustering. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2790–2797 (2009)
Ertöz, L., Steinbach, M., Kumar, V.: Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data, pp. 47–58 (2003). https://doi.org/10.1137/1.9781611972733.5
Jelasity, M., Montresor, A., Babaoglu, O.: Gossip-based aggregation in large dynamic networks. ACM Trans. Comput. Syst. 23(3), 219–252 (2005). https://doi.org/10.1145/1082469.1082470
Kriegel, H.P., Kröger, P., Zimek, A.: Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. Knowl. Discov. Data 3(1), 1–58 (2009). https://doi.org/10.1145/1497577.1497578
Liu, G., Lin, Z., Yan, S., Sun, J., Yu, Y., Ma, Y.: Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 171–184 (2013)
NASA: Possible asteroid impacts with earth (2017). https://www.kaggle.com/nasa/asteroid-impacts. Accessed: 2020-01-31
Raj, P.: Predicting a pulsar star (2018). https://www.kaggle.com/pavanraj159/predicting-a-pulsar-star. Accessed: 2019-11-07
Schubert, E., Koos, A., Emrich, T., Züfle, A., Schmid, K.A., Zimek, A.: A framework for clustering uncertain data. Proc. VLDB Endow. 8(12), 1976–1979 (2015). https://doi.org/10.14778/2824032.2824115
Thudumu, S., Branch, P., Jin, J., Singh, J.J.: Adaptive clustering for outlier identification in high-dimensional data. In: Wen, S., Zomaya, A., Yang, L.T. (eds.) Algorithms and Architectures for Parallel Processing, pp. 215–228. Springer International Publishing, Cham (2020)
Tomasev, N., Radovanovic, M., Mladenic, D., Ivanovic, M.: The role of hubness in clustering high-dimensional data. IEEE Trans. Knowl. Data Eng. 26(3), 739–751 (2014)
Valcarcel Macua, S., Belanovic, P., Zazo, S.: Consensus-based distributed principal component analysis in wireless sensor networks. In: 2010 IEEE 11th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), pp. 1–5 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Monaca, A.D., Cafaro, M., Pulimeno, M., Epicoco, I. (2023). An Adaptive Clustering Approach for Distributed Outlier Detection in Data Streams. In: Omatu, S., Mehmood, R., Sitek, P., Cicerone, S., Rodríguez, S. (eds) Distributed Computing and Artificial Intelligence, 19th International Conference. DCAI 2022. Lecture Notes in Networks and Systems, vol 583. Springer, Cham. https://doi.org/10.1007/978-3-031-20859-1_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-20859-1_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20858-4
Online ISBN: 978-3-031-20859-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)