An Adaptive Clustering Approach for Distributed Outlier Detection in Data Streams

Monaca, Andrea Della; Cafaro, Massimo; Pulimeno, Marco; Epicoco, Italo

doi:10.1007/978-3-031-20859-1_10

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 583))

Included in the following conference series:

International Symposium on Distributed Computing and Artificial Intelligence

272 Accesses

Abstract

Many real-world problems deal with collections of high-dimensional data, i.e., data with many different features. A dataset exhibiting a high number of features incurs the so-called curse of dimensionality: when the dimensionality increases, the volume of the space increases at a fast rate, causing the sparseness of the data. This makes challenging clustering high-dimensional data for outlier detection purposes. In this paper, we design and implement a distributed peer to peer version of an algorithm that addresses the curse of dimensionality by generating candidate subspaces from the high-dimensional space through Principal Component Analysis. The experimental results show that if the parameters of the distributed algorithm are properly set, then the distributed algorithm converges to the results provided by the sequential algorithm, which is a fundamental and highly desirable property.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD Rec. 27(2), 94–105 (1998). https://doi.org/10.1145/276305.276314
Article Google Scholar
Di Fatta, G., Blasa, F., Cafiero, S., Fortino, G.: Epidemic k-means clustering. In: 2011 IEEE 11th International Conference on Data Mining Workshops, pp. 151–158 (2011)
Google Scholar
Elhamifar, E., Vidal, R.: Sparse subspace clustering. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2790–2797 (2009)
Google Scholar
Ertöz, L., Steinbach, M., Kumar, V.: Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data, pp. 47–58 (2003). https://doi.org/10.1137/1.9781611972733.5
Jelasity, M., Montresor, A., Babaoglu, O.: Gossip-based aggregation in large dynamic networks. ACM Trans. Comput. Syst. 23(3), 219–252 (2005). https://doi.org/10.1145/1082469.1082470
Article Google Scholar
Kriegel, H.P., Kröger, P., Zimek, A.: Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. Knowl. Discov. Data 3(1), 1–58 (2009). https://doi.org/10.1145/1497577.1497578
Article Google Scholar
Liu, G., Lin, Z., Yan, S., Sun, J., Yu, Y., Ma, Y.: Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 171–184 (2013)
Article Google Scholar
NASA: Possible asteroid impacts with earth (2017). https://www.kaggle.com/nasa/asteroid-impacts. Accessed: 2020-01-31
Raj, P.: Predicting a pulsar star (2018). https://www.kaggle.com/pavanraj159/predicting-a-pulsar-star. Accessed: 2019-11-07
Schubert, E., Koos, A., Emrich, T., Züfle, A., Schmid, K.A., Zimek, A.: A framework for clustering uncertain data. Proc. VLDB Endow. 8(12), 1976–1979 (2015). https://doi.org/10.14778/2824032.2824115
Article Google Scholar
Thudumu, S., Branch, P., Jin, J., Singh, J.J.: Adaptive clustering for outlier identification in high-dimensional data. In: Wen, S., Zomaya, A., Yang, L.T. (eds.) Algorithms and Architectures for Parallel Processing, pp. 215–228. Springer International Publishing, Cham (2020)
Chapter Google Scholar
Tomasev, N., Radovanovic, M., Mladenic, D., Ivanovic, M.: The role of hubness in clustering high-dimensional data. IEEE Trans. Knowl. Data Eng. 26(3), 739–751 (2014)
Article Google Scholar
Valcarcel Macua, S., Belanovic, P., Zazo, S.: Consensus-based distributed principal component analysis in wireless sensor networks. In: 2010 IEEE 11th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), pp. 1–5 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Salento, Lecce, Italy
Andrea Della Monaca, Massimo Cafaro, Marco Pulimeno & Italo Epicoco
Euro-Mediterranean Centre on Climate Change, Foundation, Lecce, Italy
Massimo Cafaro & Italo Epicoco

Authors

Andrea Della Monaca
View author publications
You can also search for this author in PubMed Google Scholar
Massimo Cafaro
View author publications
You can also search for this author in PubMed Google Scholar
Marco Pulimeno
View author publications
You can also search for this author in PubMed Google Scholar
Italo Epicoco
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrea Della Monaca .

Editor information

Editors and Affiliations

Hiroshima University, Hiroshima, Japan
Sigeru Omatu
King Abdulaziz University, Jeddah, Saudi Arabia
Rashid Mehmood
Kielce University of Technology, Kielce, Poland
Pawel Sitek
Palazzo Camponeschi, University of L'Aquila, L'Aquila, Italy
Serafino Cicerone
BISITE, Edificio I+D+i, University of Salamanca, Salamanca, Spain
Sara Rodríguez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Monaca, A.D., Cafaro, M., Pulimeno, M., Epicoco, I. (2023). An Adaptive Clustering Approach for Distributed Outlier Detection in Data Streams. In: Omatu, S., Mehmood, R., Sitek, P., Cicerone, S., Rodríguez, S. (eds) Distributed Computing and Artificial Intelligence, 19th International Conference. DCAI 2022. Lecture Notes in Networks and Systems, vol 583. Springer, Cham. https://doi.org/10.1007/978-3-031-20859-1_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-20859-1_10
Published: 13 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20858-4
Online ISBN: 978-3-031-20859-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

An Adaptive Clustering Approach for Distributed Outlier Detection in Data Streams