Abstract
Detecting outliers in data is essential in various fields, such as finance, healthcare, and many other domains with anomalies. Among well-known outlier detection algorithms, Local Outlier Factor (LOF) is widely used for identifying unusual data points. However, the computational time of LOF significantly increases when dealing with large datasets containing numerical and categorical features. We propose an innovative approach using block size optimisation to speed up the outlier detection process while maintaining high accuracy. By optimizing the block size, we achieve a significant improvement in LOF’s performance without compromising its effectiveness. Experiment results on diverse datasets containing mixed categorical and numerical features demonstrate the effectiveness of our method in accelerating outlier detection while retaining high detection accuracy. This advancement in outlier detection has the potential to improve decision-making processes. It empowers the timely identification of anomalous events, which is significant in critical applications, including cybersecurity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alghushairy, O., Alsini, R., Soule, T., Ma, X.: A review of local outlier factor algorithms for outlier detection in big data streams. Big Data Cogn. Comput. 5, 1 (2021). https://doi.org/10.3390/bdcc5010001
Yu, J.X., Qian, W., Lu, H., Zhou, A.: Finding centric local outliers in categorical/numerical spaces. Knowl. Inf. Syst. 9(3), 309–338 (2006). http://dx.doi.org/10.1007/s10115-005-0197-6
Taha, A., Hadi, A.S.: Anomaly detection methods for categorical data: a review. ACM Comput. Surv. 52(2), 1–35 (2019). https://doi.org/10.1145/3312739
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: SIGMOD 2000: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 93–104 (2000). https://doi.org/10.1145/342009.335388
Hawkins, D.M.: Identification of Outliers. Chapman and Hall/Springer, London/Dordrecht (1980). https://doi.org/10.1007/978-94-015-3994-4
Nowak-Brzezińska, A., Horyń, C.: Outliers in COVID-19 data based on rule representation - the analysis of LOF algorithm. Procedia Comput. Sci. 192, 3010–3019 (2021). https://doi.org/10.1016/j.procs.2021.09.073. ISSN 1877-0509
Aggarwal, C.C.: An Introduction to Outlier Analysis. In Outlier Analysis, pp. 1–40. Springer, Heidelberg (2013). https://doi.org/10.1007/978-1-4614-6396-2
Sinha, A., Jana, P.K.: Efficient algorithms for local density based anomaly detection. In: Negi, A., Bhatnagar, R., Parida, L. (eds.) ICDCIT 2018. LNCS, vol. 10722, pp. 336–342. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-72344-0_30
Su, S., et al.: An efficient density-based local outlier detection approach for scattered data. IEEE Access 7, 1006–1020 (2019). https://doi.org/10.1109/ACCESS.2018.2886197
Zhang, K., Hutter, M., Jin, H.: A new local distance-based outlier detection approach for scattered real-world data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 813–822. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01307-2_84
Bohanec, M.: Car evaluation. UCI Machine Learning Repository (1997). https://doi.org/10.24432/C5JP48, accessed 1 August 2023
Mushroom: UCI Machine Learning Repository (1987). https://doi.org/10.24432/C5959T. Accessed 1 Aug 2023
Moro, S., Rita, P., Cortez, P.: Bank marketing. UCI Machine Learning Repository (2012). https://doi.org/10.24432/C5K306, https://www.kaggle.com/datasets/janiobachmann/bank-marketing-dataset. Accessed 1 Aug 2023
CitiBike. https://www.citibikenyc.com/system-data. The studied set of the first 20,000 records: https://www.kaggle.com/datasets/sujan97/citibike-system-data. Accessed 1 Aug 2023
Becker, B., Kohavi, R.: Adult. UCI Machine Learning Repository (1996). https://doi.org/10.24432/C5XW20. Accessed 1 Aug 2023
Dal Pozzolo, A., Caelen, O., Le Borgne, Y.-A., Waterschoot, S., Bontempi, G.: Learned lessons in credit card fraud detection from a practitioner perspective. Expert Syst. Appl. 41(10), 4915-4928 (2014). https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud. Accessed 1 Aug 2023
Lathrop, R.: p53 mutants. UCI Machine Learning Repository (2010). https://doi.org/10.24432/C5T89H. Introductory Paper: Danziger, S.A., et al.: Predicting positive p53 cancer rescue regions using most informative positive (MIP) active learning. PLoS Comput. Biol. (2009). https://doi.org/10.1371/journal.pcbi.1000498. Accessed 1 Aug 2023
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Horyń, C., Nowak-Brzezińska, A. (2023). Improving Detection Efficiency: Optimizing Block Size in the Local Outlier Factor (LOF) Algorithm. In: Campagner, A., Urs Lenz, O., Xia, S., Ślęzak, D., Wąs, J., Yao, J. (eds) Rough Sets. IJCRS 2023. Lecture Notes in Computer Science(), vol 14481. Springer, Cham. https://doi.org/10.1007/978-3-031-50959-9_43
Download citation
DOI: https://doi.org/10.1007/978-3-031-50959-9_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-50958-2
Online ISBN: 978-3-031-50959-9
eBook Packages: Computer ScienceComputer Science (R0)