Skip to main content

Improving Detection Efficiency: Optimizing Block Size in the Local Outlier Factor (LOF) Algorithm

  • Conference paper
  • First Online:
Rough Sets (IJCRS 2023)

Abstract

Detecting outliers in data is essential in various fields, such as finance, healthcare, and many other domains with anomalies. Among well-known outlier detection algorithms, Local Outlier Factor (LOF) is widely used for identifying unusual data points. However, the computational time of LOF significantly increases when dealing with large datasets containing numerical and categorical features. We propose an innovative approach using block size optimisation to speed up the outlier detection process while maintaining high accuracy. By optimizing the block size, we achieve a significant improvement in LOF’s performance without compromising its effectiveness. Experiment results on diverse datasets containing mixed categorical and numerical features demonstrate the effectiveness of our method in accelerating outlier detection while retaining high detection accuracy. This advancement in outlier detection has the potential to improve decision-making processes. It empowers the timely identification of anomalous events, which is significant in critical applications, including cybersecurity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Alghushairy, O., Alsini, R., Soule, T., Ma, X.: A review of local outlier factor algorithms for outlier detection in big data streams. Big Data Cogn. Comput. 5, 1 (2021). https://doi.org/10.3390/bdcc5010001

    Article  Google Scholar 

  2. Yu, J.X., Qian, W., Lu, H., Zhou, A.: Finding centric local outliers in categorical/numerical spaces. Knowl. Inf. Syst. 9(3), 309–338 (2006). http://dx.doi.org/10.1007/s10115-005-0197-6

  3. Taha, A., Hadi, A.S.: Anomaly detection methods for categorical data: a review. ACM Comput. Surv. 52(2), 1–35 (2019). https://doi.org/10.1145/3312739

    Article  Google Scholar 

  4. Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: SIGMOD 2000: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 93–104 (2000). https://doi.org/10.1145/342009.335388

  5. Hawkins, D.M.: Identification of Outliers. Chapman and Hall/Springer, London/Dordrecht (1980). https://doi.org/10.1007/978-94-015-3994-4

  6. Nowak-Brzezińska, A., Horyń, C.: Outliers in COVID-19 data based on rule representation - the analysis of LOF algorithm. Procedia Comput. Sci. 192, 3010–3019 (2021). https://doi.org/10.1016/j.procs.2021.09.073. ISSN 1877-0509

    Article  Google Scholar 

  7. Aggarwal, C.C.: An Introduction to Outlier Analysis. In Outlier Analysis, pp. 1–40. Springer, Heidelberg (2013). https://doi.org/10.1007/978-1-4614-6396-2

  8. Sinha, A., Jana, P.K.: Efficient algorithms for local density based anomaly detection. In: Negi, A., Bhatnagar, R., Parida, L. (eds.) ICDCIT 2018. LNCS, vol. 10722, pp. 336–342. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-72344-0_30

    Chapter  Google Scholar 

  9. Su, S., et al.: An efficient density-based local outlier detection approach for scattered data. IEEE Access 7, 1006–1020 (2019). https://doi.org/10.1109/ACCESS.2018.2886197

    Article  Google Scholar 

  10. Zhang, K., Hutter, M., Jin, H.: A new local distance-based outlier detection approach for scattered real-world data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 813–822. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01307-2_84

    Chapter  Google Scholar 

  11. Bohanec, M.: Car evaluation. UCI Machine Learning Repository (1997). https://doi.org/10.24432/C5JP48, accessed 1 August 2023

  12. Mushroom: UCI Machine Learning Repository (1987). https://doi.org/10.24432/C5959T. Accessed 1 Aug 2023

  13. Moro, S., Rita, P., Cortez, P.: Bank marketing. UCI Machine Learning Repository (2012). https://doi.org/10.24432/C5K306, https://www.kaggle.com/datasets/janiobachmann/bank-marketing-dataset. Accessed 1 Aug 2023

  14. CitiBike. https://www.citibikenyc.com/system-data. The studied set of the first 20,000 records: https://www.kaggle.com/datasets/sujan97/citibike-system-data. Accessed 1 Aug 2023

  15. Becker, B., Kohavi, R.: Adult. UCI Machine Learning Repository (1996). https://doi.org/10.24432/C5XW20. Accessed 1 Aug 2023

  16. Dal Pozzolo, A., Caelen, O., Le Borgne, Y.-A., Waterschoot, S., Bontempi, G.: Learned lessons in credit card fraud detection from a practitioner perspective. Expert Syst. Appl. 41(10), 4915-4928 (2014). https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud. Accessed 1 Aug 2023

  17. Lathrop, R.: p53 mutants. UCI Machine Learning Repository (2010). https://doi.org/10.24432/C5T89H. Introductory Paper: Danziger, S.A., et al.: Predicting positive p53 cancer rescue regions using most informative positive (MIP) active learning. PLoS Comput. Biol. (2009). https://doi.org/10.1371/journal.pcbi.1000498. Accessed 1 Aug 2023

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Czesław Horyń .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Horyń, C., Nowak-Brzezińska, A. (2023). Improving Detection Efficiency: Optimizing Block Size in the Local Outlier Factor (LOF) Algorithm. In: Campagner, A., Urs Lenz, O., Xia, S., Ślęzak, D., Wąs, J., Yao, J. (eds) Rough Sets. IJCRS 2023. Lecture Notes in Computer Science(), vol 14481. Springer, Cham. https://doi.org/10.1007/978-3-031-50959-9_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-50959-9_43

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-50958-2

  • Online ISBN: 978-3-031-50959-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics