Skip to main content

Modeling Outlier Score Distributions

  • Conference paper
Book cover Advanced Data Mining and Applications (ADMA 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7713))

Included in the following conference series:

Abstract

A common approach to outlier detection is to provide a ranked list of objects based on an estimated outlier score for each object. A major problem of such an approach is determining how many objects should be chosen as outlier from a ranked list. Other outlier detection methods, transform the outlier scores into probability values and then use a user-predefined threshold to identify outliers. Ad hoc threshold values, which are hard to justify, are often used. Outlier detection accuracy can be seriously reduced if an incorrect threshold value is used. To address these problems, we propose a formal approach to analyse the outlier scores in order to automatically discriminate between outliers and inliers. Specifically, we devise a probabilistic approach to model the score distributions of outlier scoring algorithms. The probability density function of the outlier scores is therefore estimated and the outlier objects are automatically identified.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kriegel, H.-P., Kroger, P., Schubert, E., Zimek, A.: Interpreting and Unifying Outlier Scores. In: 11th SIAM International Conference on Data Mining (SDM 2011), pp. 13–24 (2011)

    Google Scholar 

  2. Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison Wesley (2006)

    Google Scholar 

  3. Chandola, V., Banerjee, A., Kumar, V.: Anomaly Detection: A Survey. ACM Computing Surveys 41(3) (2009)

    Google Scholar 

  4. Yamanishi, K., Takeuchi, J.-I., Williams, G., Milne, P.: On-line Unsupervised Learning Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms. In: 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2000), pp. 320–324 (2000)

    Google Scholar 

  5. Knorr, E.M., Ng, R.T.: Algorithms for Mining Distance-Based Outliers in Large Datasets. In: 24th International Conference on Very Large Data Bases (VLDB 1998), pp. 392–403 (1998)

    Google Scholar 

  6. Ramaswamy, S., Rastogi, R., Shim, K.: Efficient Algorithms for Mining Outliers from Large Data Sets. In: ACM SIGMOD International Conference on Management of Data (SIGMOD 2000), pp. 427–438 (2000)

    Google Scholar 

  7. Angiulli, F., Pizzuti, C.: Fast Outlier Detection in High Dimensional Spaces. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 15–26. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  8. Breunig, S., Kriegel, H.-P., Ng, R., Sander, J.: LOF: Identifying Density-Based Local Outliers. In: ACM SIGMOD International Conference on Management of Data (SIGMOD 2000), pp. 93–104 (2000)

    Google Scholar 

  9. Zhang, K., Hutter, M., Jin, H.: A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 813–822. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  10. Breunig, M.M., Kriegel, H.-P., Ng, R., Sander, J.: OPTICS-OF: Identifying Local Outliers. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS (LNAI), vol. 1704, pp. 262–270. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  11. Jin, W., Tung, A., Han, J., Wang, W.: Ranking Outliers Using Symmetric Neighborhood Relationship. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 577–593. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  12. Pei, Y., Zaiane, O.R., Gao, Y.: An Efficient Reference-based Approach to Outlier Detection in Large Datasets. In: 6th IEEE International Conference on Data Mining (ICDM 2006), pp. 478–487 (2006)

    Google Scholar 

  13. Gao, J., Tan, P.-N.: Converting Output Scores from Outlier Detection Algorithms into Probability Estimates. In: 6th IEEE International Conference on Data Mining (ICDM 2006), pp. 1–10 (2006)

    Google Scholar 

  14. Ma, Z., Leijon, A.: Beta Mixture Models and the Application to Image Classification. In: 16th IEEE International Conference on Image Processing (ICIP 2009), pp. 2045–2048 (2009)

    Google Scholar 

  15. Bouguila, N., Ziou, D., Monga, E.: Practical Bayesian Estimation of a Finite Beta Mixture Through Gibbs Sampling and its Applications. Statistics and Computing 16(2), 215–225 (2006)

    Article  MathSciNet  Google Scholar 

  16. Zuliani, M., Kenny, C.S., Manjunath, B.S.: The Multiransac Algorithm and its Application to Detect Planar Homographies. In: 12th IEEE International Conference on Image Processing, ICIP 2005 (2005)

    Google Scholar 

  17. Bain, L.J., Engelhardt, M.: Introduction to Probability and Mathematical Statistics, 2nd edn. Duxbury Press (2000)

    Google Scholar 

  18. Dempster, A., Laird, N., Rubin, D.: Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of Royal Statistical Society (Series B) 39, 1–37 (1977)

    MathSciNet  MATH  Google Scholar 

  19. Figueiredo, M.A.T., Jain, A.K.: Unsupervised Learning of Finite Mixture Models. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(3), 381–396 (2002)

    Article  Google Scholar 

  20. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum, New York (1981)

    Book  MATH  Google Scholar 

  21. Schwarz, G.: Estimating the Dimension of a Model. Annals of Statistics 6(2), 461–464 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  22. Achtert, E., Goldhofer, S., Kriegel, H.-P., Schubert, E., Zimek, A.: Evaluation of Clusterings - Metrics and Visual Support. In: 28th IEEE International Conference on Data Engineering (ICDE 2012), pp. 1285–1288 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bouguessa, M. (2012). Modeling Outlier Score Distributions. In: Zhou, S., Zhang, S., Karypis, G. (eds) Advanced Data Mining and Applications. ADMA 2012. Lecture Notes in Computer Science(), vol 7713. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35527-1_59

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35527-1_59

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35526-4

  • Online ISBN: 978-3-642-35527-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics