Abstract
Datasets from different domains usually contain data defined over a wide set of attributes or features linked through correlation relationship. Moreover, there are some applications in which not all the attributes should be treated in the same fashion as some of them can be perceived like independent variables that are responsible for the definition of the expected behaviour of the remaining ones. Following this pattern, we focus on the detection of those data objects showing an anomalous behaviour on a subset of attributes, called behavioural, w.r.t the other ones, we call contextual. As a first contribution, we exploit Mixture Models to describe the data distribution over each pair of behavioral-contextual attributes and learn the correlation laws binding the data on each bidimensional space. Then, we design a probability measure aimed at scoring subsequently observed objects based on how much their behaviour differs from the usual behavioural attribute values. Finally, we join the contributions calculated in each bidimensional space to provide a global outlierness measure. We test our method on both synthetic and real dataset to demonstrate its effectiveness when studying anomalous behaviour in a specific context and its ability in outperforming some competitive baselines.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
References
Aggarwal, C.C., Yu, P.S.: Outlier detection for high dimensional data. In: Proceedings of the 2001 ACM SIGMOD MOD Conference, pp. 37–46 (2001)
Angiulli, F.: On the behavior of intrinsically high-dimensional spaces: distances, direct and reverse nearest neighbors, and hubness. J. Mach. Learn. Res. 18, 170:1–170:60 (2017)
Angiulli, F., Basta, S., Pizzuti, C.: Distance-based detection and prediction of outliers. IEEE TKDE 18(2), 145–160 (2005)
Angiulli, F., Fassetti, F.: DOLPHIN: an efficient algorithm for mining distance-based outliers in very large datasets. ACM TKDD 3(1), 1–57 (2009)
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 93–104 (2000)
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3), 1–58 (2009)
Mahalanobis, P.C., et al.: On the generalized distance in statistics. In: Proceedings of the National Institute of Sciences of India, vol. 2 (1936)
Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: VLDB, vol. 98, pp. 392–403. Citeseer (1998)
Kriegel, H.-P., Kröger, P., Schubert, E., Zimek, A.: Outlier detection in arbitrarily oriented subspaces. In: 2012 IEEE 12th ICDM, pp. 379–388 (2012)
Kuo, Y.-H., Li, Z., Kifer, D.: Detecting outliers in data with correlated measures. In: Proceedings of the 27th CIKM, pp. 287–296 (2018)
Liang, J., Parthasarathy, S., Robust contextual outlier detection: where context meets sparsity. In: Proceedings of the 25th ACM CIKM, pp. 2167–2172 (2016)
Markou, M., Singh, S.: Novelty detection: a review-part 1: statistical approaches. Sig. Process. 83(12), 2481–2497 (2003)
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 427–438 (2000)
Roberts, S., Tarassenko, L.: A probabilistic resource allocating network for novelty detection. Neural Comput. 6(2), 270–284 (1994)
Schwarz, G., et al.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)
Song, X., Mingxi, W., Jermaine, C., Ranka, S.: Conditional anomaly detection. IEEE TKDE 19(5), 631–645 (2007)
Yamanishi, K., Takeuchi, J.-I., Williams, G., Milne, P.: On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Min. Knowl. Discov. 8(3), 275–300 (2004). https://doi.org/10.1023/B:DAMI.0000023676.72185.7c
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Angiulli, F., Fassetti, F., Serrao, C. (2021). ODCA: An Outlier Detection Approach to Deal with Correlated Attributes. In: Golfarelli, M., Wrembel, R., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2021. Lecture Notes in Computer Science(), vol 12925. Springer, Cham. https://doi.org/10.1007/978-3-030-86534-4_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-86534-4_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86533-7
Online ISBN: 978-3-030-86534-4
eBook Packages: Computer ScienceComputer Science (R0)