ODCA: An Outlier Detection Approach to Deal with Correlated Attributes

Angiulli, Fabrizio; Fassetti, Fabio; Serrao, Cristina

doi:10.1007/978-3-030-86534-4_17

Fabrizio Angiulli¹³,
Fabio Fassetti¹³ &
Cristina Serrao¹³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12925))

Included in the following conference series:

International Conference on Big Data Analytics and Knowledge Discovery

721 Accesses
1 Citations

Abstract

Datasets from different domains usually contain data defined over a wide set of attributes or features linked through correlation relationship. Moreover, there are some applications in which not all the attributes should be treated in the same fashion as some of them can be perceived like independent variables that are responsible for the definition of the expected behaviour of the remaining ones. Following this pattern, we focus on the detection of those data objects showing an anomalous behaviour on a subset of attributes, called behavioural, w.r.t the other ones, we call contextual. As a first contribution, we exploit Mixture Models to describe the data distribution over each pair of behavioral-contextual attributes and learn the correlation laws binding the data on each bidimensional space. Then, we design a probability measure aimed at scoring subsequently observed objects based on how much their behaviour differs from the usual behavioural attribute values. Finally, we join the contributions calculated in each bidimensional space to provide a global outlierness measure. We test our method on both synthetic and real dataset to demonstrate its effectiveness when studying anomalous behaviour in a specific context and its ability in outperforming some competitive baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Aggarwal, C.C., Yu, P.S.: Outlier detection for high dimensional data. In: Proceedings of the 2001 ACM SIGMOD MOD Conference, pp. 37–46 (2001)
Google Scholar
Angiulli, F.: On the behavior of intrinsically high-dimensional spaces: distances, direct and reverse nearest neighbors, and hubness. J. Mach. Learn. Res. 18, 170:1–170:60 (2017)
Google Scholar
Angiulli, F., Basta, S., Pizzuti, C.: Distance-based detection and prediction of outliers. IEEE TKDE 18(2), 145–160 (2005)
MATH Google Scholar
Angiulli, F., Fassetti, F.: DOLPHIN: an efficient algorithm for mining distance-based outliers in very large datasets. ACM TKDD 3(1), 1–57 (2009)
Article Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
MATH Google Scholar
Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 93–104 (2000)
Google Scholar
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3), 1–58 (2009)
Article Google Scholar
Mahalanobis, P.C., et al.: On the generalized distance in statistics. In: Proceedings of the National Institute of Sciences of India, vol. 2 (1936)
Google Scholar
Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: VLDB, vol. 98, pp. 392–403. Citeseer (1998)
Google Scholar
Kriegel, H.-P., Kröger, P., Schubert, E., Zimek, A.: Outlier detection in arbitrarily oriented subspaces. In: 2012 IEEE 12th ICDM, pp. 379–388 (2012)
Google Scholar
Kuo, Y.-H., Li, Z., Kifer, D.: Detecting outliers in data with correlated measures. In: Proceedings of the 27th CIKM, pp. 287–296 (2018)
Google Scholar
Liang, J., Parthasarathy, S., Robust contextual outlier detection: where context meets sparsity. In: Proceedings of the 25th ACM CIKM, pp. 2167–2172 (2016)
Google Scholar
Markou, M., Singh, S.: Novelty detection: a review-part 1: statistical approaches. Sig. Process. 83(12), 2481–2497 (2003)
Article Google Scholar
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 427–438 (2000)
Google Scholar
Roberts, S., Tarassenko, L.: A probabilistic resource allocating network for novelty detection. Neural Comput. 6(2), 270–284 (1994)
Article Google Scholar
Schwarz, G., et al.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)
Article MathSciNet Google Scholar
Song, X., Mingxi, W., Jermaine, C., Ranka, S.: Conditional anomaly detection. IEEE TKDE 19(5), 631–645 (2007)
Google Scholar
Yamanishi, K., Takeuchi, J.-I., Williams, G., Milne, P.: On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Min. Knowl. Discov. 8(3), 275–300 (2004). https://doi.org/10.1023/B:DAMI.0000023676.72185.7c
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

DIMES, University of Calabria, 87036, Rende, CS, Italy
Fabrizio Angiulli, Fabio Fassetti & Cristina Serrao

Authors

Fabrizio Angiulli
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Fassetti
View author publications
You can also search for this author in PubMed Google Scholar
Cristina Serrao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cristina Serrao .

Editor information

Editors and Affiliations

University of Bologna, Bologna, Forli/Cesena, Italy
Matteo Golfarelli
Poznań University of Technology, Poznan, Poland
Robert Wrembel
Johannes Kepler University Linz, Linz, Austria
Gabriele Kotsis
TU Wien, Vienna, Austria
A Min Tjoa
Johannes Kepler University Linz, Linz, Austria
Ismail Khalil

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Angiulli, F., Fassetti, F., Serrao, C. (2021). ODCA: An Outlier Detection Approach to Deal with Correlated Attributes. In: Golfarelli, M., Wrembel, R., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2021. Lecture Notes in Computer Science(), vol 12925. Springer, Cham. https://doi.org/10.1007/978-3-030-86534-4_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-86534-4_17
Published: 05 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86533-7
Online ISBN: 978-3-030-86534-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics