Skip to main content

ODCA: An Outlier Detection Approach to Deal with Correlated Attributes

  • Conference paper
  • First Online:
Big Data Analytics and Knowledge Discovery (DaWaK 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12925))

Included in the following conference series:

Abstract

Datasets from different domains usually contain data defined over a wide set of attributes or features linked through correlation relationship. Moreover, there are some applications in which not all the attributes should be treated in the same fashion as some of them can be perceived like independent variables that are responsible for the definition of the expected behaviour of the remaining ones. Following this pattern, we focus on the detection of those data objects showing an anomalous behaviour on a subset of attributes, called behavioural, w.r.t the other ones, we call contextual. As a first contribution, we exploit Mixture Models to describe the data distribution over each pair of behavioral-contextual attributes and learn the correlation laws binding the data on each bidimensional space. Then, we design a probability measure aimed at scoring subsequently observed objects based on how much their behaviour differs from the usual behavioural attribute values. Finally, we join the contributions calculated in each bidimensional space to provide a global outlierness measure. We test our method on both synthetic and real dataset to demonstrate its effectiveness when studying anomalous behaviour in a specific context and its ability in outperforming some competitive baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://archive.ics.uci.edu/ml/datasets/QSAR+aquatic+toxicity.

  2. 2.

    https://www.rdocumentation.org/packages/cellWise/versions/2.2.3/topics/data_philips.

  3. 3.

    https://www.kaggle.com/camnugent/california-housing-prices.

  4. 4.

    https://archive.ics.uci.edu/ml/datasets/QSAR+fish+toxicity.

  5. 5.

    https://archive.ics.uci.edu/ml/machine-learning-databases/housing/.

  6. 6.

    http://staff.pubhealth.ku.dk/~tag/Teaching/share/data/Bodyfat.html.

References

  1. Aggarwal, C.C., Yu, P.S.: Outlier detection for high dimensional data. In: Proceedings of the 2001 ACM SIGMOD MOD Conference, pp. 37–46 (2001)

    Google Scholar 

  2. Angiulli, F.: On the behavior of intrinsically high-dimensional spaces: distances, direct and reverse nearest neighbors, and hubness. J. Mach. Learn. Res. 18, 170:1–170:60 (2017)

    Google Scholar 

  3. Angiulli, F., Basta, S., Pizzuti, C.: Distance-based detection and prediction of outliers. IEEE TKDE 18(2), 145–160 (2005)

    MATH  Google Scholar 

  4. Angiulli, F., Fassetti, F.: DOLPHIN: an efficient algorithm for mining distance-based outliers in very large datasets. ACM TKDD 3(1), 1–57 (2009)

    Article  Google Scholar 

  5. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)

    MATH  Google Scholar 

  6. Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 93–104 (2000)

    Google Scholar 

  7. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3), 1–58 (2009)

    Article  Google Scholar 

  8. Mahalanobis, P.C., et al.: On the generalized distance in statistics. In: Proceedings of the National Institute of Sciences of India, vol. 2 (1936)

    Google Scholar 

  9. Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: VLDB, vol. 98, pp. 392–403. Citeseer (1998)

    Google Scholar 

  10. Kriegel, H.-P., Kröger, P., Schubert, E., Zimek, A.: Outlier detection in arbitrarily oriented subspaces. In: 2012 IEEE 12th ICDM, pp. 379–388 (2012)

    Google Scholar 

  11. Kuo, Y.-H., Li, Z., Kifer, D.: Detecting outliers in data with correlated measures. In: Proceedings of the 27th CIKM, pp. 287–296 (2018)

    Google Scholar 

  12. Liang, J., Parthasarathy, S., Robust contextual outlier detection: where context meets sparsity. In: Proceedings of the 25th ACM CIKM, pp. 2167–2172 (2016)

    Google Scholar 

  13. Markou, M., Singh, S.: Novelty detection: a review-part 1: statistical approaches. Sig. Process. 83(12), 2481–2497 (2003)

    Article  Google Scholar 

  14. Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 427–438 (2000)

    Google Scholar 

  15. Roberts, S., Tarassenko, L.: A probabilistic resource allocating network for novelty detection. Neural Comput. 6(2), 270–284 (1994)

    Article  Google Scholar 

  16. Schwarz, G., et al.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)

    Article  MathSciNet  Google Scholar 

  17. Song, X., Mingxi, W., Jermaine, C., Ranka, S.: Conditional anomaly detection. IEEE TKDE 19(5), 631–645 (2007)

    Google Scholar 

  18. Yamanishi, K., Takeuchi, J.-I., Williams, G., Milne, P.: On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Min. Knowl. Discov. 8(3), 275–300 (2004). https://doi.org/10.1023/B:DAMI.0000023676.72185.7c

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cristina Serrao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Angiulli, F., Fassetti, F., Serrao, C. (2021). ODCA: An Outlier Detection Approach to Deal with Correlated Attributes. In: Golfarelli, M., Wrembel, R., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2021. Lecture Notes in Computer Science(), vol 12925. Springer, Cham. https://doi.org/10.1007/978-3-030-86534-4_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86534-4_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86533-7

  • Online ISBN: 978-3-030-86534-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics