Skip to main content
Log in

Cluster-based outlier detection

  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

Outlier detection has important applications in the field of data mining, such as fraud detection, customer behavior analysis, and intrusion detection. Outlier detection is the process of detecting the data objects which are grossly different from or inconsistent with the remaining set of data. Outliers are traditionally considered as single points; however, there is a key observation that many abnormal events have both temporal and spatial locality, which might form small clusters that also need to be deemed as outliers. In other words, not only a single point but also a small cluster can probably be an outlier. In this paper, we present a new definition for outliers: cluster-based outlier, which is meaningful and provides importance to the local data behavior, and how to detect outliers by the clustering algorithm LDBSCAN (Duan et al. in Inf. Syst. 32(7):978–986, 2007) which is capable of finding clusters and assigning LOF (Breunig et al. in Proceedings of the 2000 ACM SIG MOD International Conference on Manegement of Data, ACM Press, pp. 93–104, 2000) to single points.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agrawal, R., Gehrke, J., Gunopulos, D., & Raghavan, P. (1998). Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD Record, 27(2), 94–105. doi:10.1145/276305.276314.

    Article  Google Scholar 

  • Ankerst, M., Breunig, M. M., Kriegel, H., & Sander, J. (1999). OPTICS: ordering points to identify the clustering structure. In Proceedings of the 1999 ACM SIGMOD international conference on management of data (pp. 49–60). SIGMOD’99, Philadelphia, Pennsylvania, United States, May 31–June 03, 1999. New York: ACM Press.

    Chapter  Google Scholar 

  • Barnett, V., & Lewis, T. (1994). Outliers in statistical data. New York: Wiley.

    Google Scholar 

  • Beyer, K. S., Goldstein, J., Ramakrishnan, R., & Shaft, U. (1999). When is “nearest neighbor” meaningful? In C. Beeri & P. Buneman (Eds.), Lecture notes in computer science: Vol. 1540. Proceeding of the 7th international conference on database theory (pp. 217–235). January 10–12, 1999. London: Springer.

    Google Scholar 

  • Breunig, M. M., Kriegel, H., Ng, R. T., & Sander, J. (2000). LOF: identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD international conference on management of data (pp. 93–104). SIGMOD’00, Dallas, Texas, United States, May 15–18, 2000. New York: ACM Press.

    Chapter  Google Scholar 

  • Carvalho, R., & Costa, H. (2007). Application of an integrated decision support process for supplier selection. Enterprise Information Systems, 1(2), 197–216. doi:10.1080/17517570701356208.

    Article  Google Scholar 

  • Crovella, M. E., & Bestavros, A. (1997). Self-similarity in World Wide Web traffic: evidence and possible causes. IEEE/ACM Transactions on Networking, 5(6), 835–846.

    Article  Google Scholar 

  • Duan, L., Xu, L., Guo, F., Lee, J., & Yan, B. (2007). A local-density based spatial clustering algorithm with noise. Information Systems, 32(7), 978–986. doi:10.1016/j.is.2006.10.006.

    Article  Google Scholar 

  • Ester, M., Kriegel, H., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noises. In Proc. 2nd int. conf. on knowledge discovery and data mining (pp. 226–231). AAAI Press: Portland.

    Google Scholar 

  • Guha, S., Rastogi, R., & Shim, K. (1998). CURE: an efficient clustering algorithm for large databases. In A. Tiwary & M. Franklin (Eds.), Proceedings of the 1998 ACM SIGMOD international conference on management of data (pp. 73–84). SIGMOD’98 Seattle, Washington, United States, June 01–04, 1998. New York: ACM Press.

    Chapter  Google Scholar 

  • Han, J., & Kamber, M. (2006). Data mining: concepts and techniques. Amsterdam: Elsevier.

    Google Scholar 

  • Hawkins, D. (1980). Identification of outliers. London: Chapman and Hall.

    Google Scholar 

  • He, Z., Xu, X., & Deng, S. (2003). Discovering cluster-based local outliers. Pattern Recognition Letters, 24(9–10), 1641–1650. doi:10.1016/S0167-8655(02)00160-5.

    Article  Google Scholar 

  • Hinneburg, A., & Keim, D. 1998. An efficient approach to clustering in large multimedia databases with noise. In Proc. 4th int. conf. on knowledge discovery and data mining (pp. 58–65). New York.

  • Hinneburg, A., Aggarwal, C. C., & Keim, D. A. (2000). What is the nearest neighbor in high dimensional spaces? In A. E. Abbadi, M. L. Brodie, S. Chakravarthy, U. Dayal, N. Kamel, G. Schlageter, & K. Whang (Eds.), Proceedings of the 26th international conference on very large data bases (pp. 506–515). Very large data bases, September 10–14, 2000. San Francisco: Morgan Kaufmann Publishers.

    Google Scholar 

  • Hsu, C., & Wallace, W. A. (2007). An industrial network flow information integration model for supply chain management and intelligent transportation. Enterprise Information Systems, 1(3), 327–351. doi:10.1080/17517570701504633.

    Article  Google Scholar 

  • Jiang, M. F., Tseng, S. S., & Su, C. M. (2001). Two-phase clustering process for outliers detection. Pattern Recognition Letters, 22(6–7), 691–700.

    Article  Google Scholar 

  • Johnson, T., Kwok, I., & Ng, R. (1998). Fast computation of 2-dimensional depth contours. In Proc. 4th int. conf. on knowledge discovery and data mining (pp. 224–228). New York: AAAI Press.

    Google Scholar 

  • Knorr, E. M., & Ng, R. T. (1998). Algorithms for mining distance-based outliers in large datasets. In A. Gupta, O. Shmueli, & J. Widom (Eds.), Proceedings of the 24rd international conference on very large data bases (pp. 392–403). Very large data bases, August 24–27, 1998. San Francisco: Morgan Kaufmann Publishers.

    Google Scholar 

  • Knorr, E. M., & Ng, R. T. (1999). Finding intensional knowledge of distance-based outliers. In M. P. Atkinson, M. E. Orlowska, P. Valduriez, S. B. Zdonik, & M. L. Brodie (Eds.), Proceedings of the 25th international conference on very large data bases (pp. 211–222). Very large data bases, September 07–10, 1999. San Francisco: Morgan Kaufmann Publishers.

    Google Scholar 

  • Li, H., & Xu, L. (2001). Feature space theory—a mathematical foundation for data mining. Knowledge-Based Systems, 14(5–6), 253–257. doi:10.1016/S0950-7051(01)00103-4.

    Article  Google Scholar 

  • Li, H., Xu, L., Wang, J., & Mo, Z. (2003). Feature space theory in data mining: transformations between extensions and intensions in knowledge representation. Expert Systems, 20(2), 60–71. doi:10.1111/1468-0394.00226.

    Article  Google Scholar 

  • Luo, J., Xu, L., Jamont, J., Zeng, L., & Shi, Z. (2007). Flood decision support system on agent grid: method and implementation. Enterprise Information Systems, 1(1), 49–68. doi:10.1080/17517570601092184.

    Article  Google Scholar 

  • Ng, R., & Han, J. (2002). CLARANS: a method for clustering objects for spatial data mining. IEEE Transactions on Knowledge and Data Engineering, 14(5), 1003–1016.

    Article  Google Scholar 

  • Preparata, F., & Shamos, M. (1988). Computational geometry: an introduction. Berlin: Springer.

    Google Scholar 

  • Qiu, G., Li, H., Xu, L., & Zhang, W. (2003). A knowledge processing method for intelligent systems based on inclusion degree. Expert Systems, 20(4), 187–195. doi:10.1111/1468-0394.00243.

    Article  Google Scholar 

  • Ramaswamy, S., Rastogi, R., & Shim, K. (2000). Efficient algorithms for mining outliers from large data sets. In Proceedings of the 2000 ACM SIGMOD international conference on management of data (pp. 427–438). SIGMOD’00, Dallas, Texas, United States, May 15–18, 2000. New York: ACM Press.

    Chapter  Google Scholar 

  • Sheikholeslami, G., Chatterjee, S., & Zhang, A. (1998). WaveCluster: a multi-resolution clustering approach for very large spatial databases. In A. Gupta, O. Shmueli, & J. Widom (Eds.), Proceedings of the 24rd international conference on very large data bases (pp. 428–439). Very large data bases, August 24–27, 1998. San Francisco: Morgan Kaufmann Publishers.

    Google Scholar 

  • Shi, Z., Huang, Y., He, Q., Xu, L., Liu, S., Qin, L., Jia, Z., Li, J., Huang, H., & Zhao, L. (2007). MSMiner-a developing platform for OLAP. Decision Support Systems, 42(4), 2016–2028. doi:10.1016/j.dss.2004.11.006.

    Article  Google Scholar 

  • Tukey, J. W. (1977). Exploratory data analysis. Reading: Addison–Wesley.

    Google Scholar 

  • Wang, W., Yang, J., & Muntz, R. R. (1997). STING: a statistical information grid approach to spatial data mining. In M. Jarke, M. J. Carey, K. R. Dittrich, F. H. Lochovsky, P. Loucopoulos, & M. A. Jeusfeld (Eds.), Proceedings of the 23rd international conference on very large data bases (pp. 186–195). Very large data bases, August 25–29, 1997. San Francisco: Morgan Kaufmann Publishers.

    Google Scholar 

  • Xu, L. (2006). Advances in intelligent information processing. Expert Systems, 23(5), 249–250. doi:10.1111/j.1468-0394.2006.00405.x.

    Article  Google Scholar 

  • Xu, L., Liang, N., & Gao, Q. (2008). An integrated approach for agricultural ecosystem management, IEEE Transactions on Systems Man and Cybernetics, Part C, 38(3).

  • Zhang, M., Xu, L., Zhang, W., & Li, H. (2003). A rough set approach to knowledge reduction based on inclusion degree and evidence reasoning theory. Expert Systems, 20(5), 298–304. doi:10.1111/1468-0394.00254.

    Article  Google Scholar 

  • Zhang, T., Ramakrishnan, R., & Livny, M. (1996). BIRCH: an efficient data clustering method for very large databases. In J. Widom (Ed.), Proceedings of the 1996 ACM SIGMOD international conference on management of data (pp. 103–114). SIGMOD’96 Montreal, Quebec, Canada, June 04–06, 1996. New York: ACM Press.

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lian Duan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Duan, L., Xu, L., Liu, Y. et al. Cluster-based outlier detection. Ann Oper Res 168, 151–168 (2009). https://doi.org/10.1007/s10479-008-0371-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-008-0371-9

Keywords

Navigation