Abstract
The ever-increasing volume of spatial data has greatly challenged our ability to extract useful but implicit knowledge from them. As an important branch of spatial data mining, spatial outlier detection aims to discover the objects whose non-spatial attribute values are significantly different from the values of their spatial neighbors. These objects, called spatial outliers, may reveal important phenomena in a number of applications including traffic control, satellite image analysis, weather forecast, and medical diagnosis. Most of the existing spatial outlier detection algorithms mainly focus on identifying single attribute outliers and could potentially misclassify normal objects as outliers when their neighborhoods contain real spatial outliers with very large or small attribute values. In addition, many spatial applications contain multiple non-spatial attributes which should be processed altogether to identify outliers. To address these two issues, we formulate the spatial outlier detection problem in a general way, design two robust detection algorithms, one for single attribute and the other for multiple attributes, and analyze their computational complexities. Experiments were conducted on a real-world data set, West Nile virus data, to validate the effectiveness of the proposed algorithms.
Similar content being viewed by others
References
C.C. Aggarwal. “Redesigning distance functions and distance-based applications for high dimensional data,” SIGMOD Record, Vol. 30(1):13–18, March 2001.
C.C. Aggarwal, J.L. Wolf, P.S. Yu, C. Procopiuc, and J. S. Park. “Fast algorithms for projected clustering,” in Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pp. 61–72, Philadelphia, Pennsylvania, United States, June 1–3, 1999.
C.C. Aggarwal and P.S. Yu. “Outlier detection for high dimensional data,” in Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pp. 37–46, Santa Barbara, California, United States, May 21–24, 2001.
V. Barnett and T. Lewis. Outliers in Statistical Data. Wiley, New York, 1994.
S. Berchtold, C. Böhm, and H.-P. Kriegal. “The pyramid-technique: Towards breaking the curse of dimensionality,” in Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pp. 142–153, Seattle, Washington, United States, June 2–4, 1998.
M.M. Breunig, H.-P. Kriegel, R.T. Ng, and J. Sander. “Lof: Identifying density-based local outliers.” in Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 93–104, Dallas, Texas, United States, May 14–19, 2000.
A. Cerioli and M. Riani. “The ordering of spatial data and the detection of multiple outliers,” Journal of Computational and Graphical Statistics, Vol. 8(2):239–258, June 1999.
P.K. Chan, W. Fan, A.L. Prodromidis, and S.J. Stolfo. “Distributed data mining in credit card fraud detection,” IEEE Intelligent Systems, Vol. 14(6):67–74, 1999.
W.S. Chan and W.N. Liu. “Diagnosing shocks in stock markets of Southeast Asia, Australia, and New Zealand,” Mathematics and Computers in Simulation, Vol. 59(1–3):223–232, 2002.
A. Conci and C.B. Proença. “A system for real-time fabric inspection and industrial decision,” in Proceedings of the 14th International Conference on Software Engineering and Knowledge Engineering, pp. 707–714, Ischia, Italy, July 15–19, 2002.
D. Freedman, R. Pisani, and R. Purves. Statistics. Norton, Vol. 41:212–223, 1998.
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. “A density-based algorithm for discovering clusters in large spatial databases with noise,” in the Second International Conference on Knowledge Discovery and Data Mining, pp. 226–231, Portland, Oregon, United States, August 2–4, 1996.
R. Haining. Spatial Data Analysis in the Social and Environmental Sciences. Cambridge University Press, 1993.
J. Hardin and D.M. Rocke. “The distribution of robust distances,” Journal of Computational and Graphical Statistics, Vol. 14:1–19, 2005.
J. Haslett, R. Brandley, P. Craig, A. Unwin, and G. Wills. “Dynamic graphics for exploring spatial data with application to locating global and local anomalies,” The American Statistician, Vol. 45:234–242, 1991.
A. Hinneburg, C.C. Aggarwal, and D.A. Keim. “What is the nearest neighbor in high dimensional spaces?” in Proceedings of 26th International Conference on Very Large Data Bases, pp. 506–515, Cairo, Egypt, September 10–14, 2000.
W. Jin, A.K.H. Tung, and J. Han. “Mining top-n local outliers in large databases,” in Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 293–298, San Francisco, California, United States, August 26–29, 2001.
E.M. Knorr and R.T. Ng. “Algorithms for mining distance-based outliers in large datasets,” in Proceedings of the 24th International Conference on Very Large Data Bases, pp. 392–403, New York City, NY, United States, August 24–27, 1998.
H. Liu, K.C. Jezek, and M.E. O’Kelly. “Detecting outliers in irregularly distributed spatial data sets by locally adaptive and robust statistical analysis and gis,” International Journal of Geographical Information Science, Vol. 15(8):721–741, 2001.
C.-T. Lu, D. Chen, and Y. Kou. “Detecting spatial outliers with multiple attributes,” in Proceedings of the 15th International Conference on Tools with Artificial Intelligence, pp. 122–128, Sacramento, California, United States, November 3–5, 2003.
C.-T. Lu, D. Chen, and Y. Kou. “Algorithms for spatial outlier detection,” in Proceedings of the 3rd IEEE International Conference on Data Mining, Melbourne, Florida, pp. 597–600, November 19–22, 2003.
C.-T. Lu and L.R. Liang. “Wavelet fuzzy classification for detecting and tracking region outliers in meteorological data,” in Proceedings of the 12th Annual ACM International Workshop on Geographic Information Systems, pp. 258–265, Washington DC, United States, November 12–13, 2004.
A. Luc. “Local indicators of spatial association: Lisa.” Geographical Analysis, Vol. 27(2):93–115, 1995.
M. Blum, R.W. Floyd, V. Pratt, R. Rivest, and R. Tarjan. “Time bounds for selection,” Journal of Computer and System Sciences, Vol. 7:448–461, 1973.
A. Mkhadri. “Shrinkage parameter for the modified linear discriminant analysis,” Pattern Recognition Letters, Vol. 16(3):267–275, 1995.
R. T. Ng and J. Han. “Efficient and effective clustering methods for spatial data mining,” in Proceedings of the 20th International Conference on Very Large Data Bases, pp. 144–155, Santiago de Chile, Chile, September 12–15, 1994.
Y. Panatier. VARIOWIN: Software for Spatial Data Analysis in 2D. Springer, New York, 1996.
M. Prastawa, E. Bullitt, S. Ho, and G. Gerig. “A brain tumor segmentation framework based on outlier detection,” Medical Image Analysis, Vol. 9(5):457–466, 2004.
F.P. Preparata and M.I. Shamos. Computational Geometry—An Introduction. Springer, 1985.
S. Ramaswamy, R. Rastogi, and K. Shim. “Efficient algorithms for mining outliers from large data sets,” in Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, vol. 29, pp. 427–438, Dallas, Texas, United States, May 16–18, 2000.
P.J. Rousseeuw and K.V. Driessen. “A fast algorithm for the minimum covariance determinant estimator,” Technometrics, Vol. 41:212–223, 1999.
I. Ruts and P.J. Rousseeuw. “Computing depth contours of bivariate point clouds,” Computational Statistics and Data Analysis, Vol. 23(1):153–168, 1996.
S. Shekhar and S. Chawla. A Tour of Spatial Databases. Prentice Hall, 2002.
S. Shekhar, C.-T. Lu, and P. Zhang. “A unified approach to detecting spatial outliers,” GeoInformatica, Vol. 7(2):139–166, 2003.
S. Shekhar, C.-T. Lu, and P. Zhang. “Detecting graph-based spatial outliers: algorithms and applications (a summary of results),” in Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 371–376, San Francisco, California, United States, August 26–29, 2001.
M.E. Tipping and C.M. Bishop. “Mixtures of probabilistic principal component analysers,” Neural Computation, Vol. 11(2):443–482, 1999.
W. Tobler. “Cellular geography,” in Philosophy in Geography, pp. 379–386, Dordrecht, Holland. Dordrecht Reidel Publishing Company, 1979.
W.-K. Wong, A. Moore, G. Cooper, and M. Wagner. “Rule-based anomaly pattern detection for detecting disease outbreaks,” in The Eighteenth National Conference on Artificial Intelligence, pp. 217–223, Edmonton, Alberta, Canada, July 28–August 1, 2002.
L. Xu. “Bayesian ying-yang machine, clustering and number of clusters,” Pattern Recognition Letters, Vol. 18(11–13):1167–1178, 1997.
K. Yamanishi, J.-I. Takeuchi, G. Williams, and P. Milne. “On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms,” Data Mining and Knowledge Discovery, Vol. 8(3):275–300, 2004.
S. Zanero and S.M. Savaresi. “Unsupervised learning techniques for an intrusion detection system,” in Proceedings of the 2004 ACM Symposium on Applied Computing, pp. 412–419, Nicosia, Cyprus, March 14–17, 2004.
T. Zhang, R. Ramakrishnan, and M. Livny. “Birch: an efficient data clustering method for very large databases,” in Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp. 103–114, Montreal, Quebec, Canada, June 4–6, 1996.
J. Zhao, C.-T. Lu, and Y. Kou. “Detecting region outliers in meteorological data,” in Proceedings of the 11th ACM International Symposium on Advances in Geographic Information Systems, pp. 49–55, New Orleans, Louisiana, United States, November 7–8, 2003.
G.H. Golub and C.F. Van Loan. Matrix Computations. The Johns Hopkins University Press, 3rd ed., 1996.
S. Verboven and M. Hubert. “LIBRA: a Matlab library for robust analysis,” Chemometrics and Intelligent Laboratory Systems, Vol. 75:127–136, 1996.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, D., Lu, CT., Kou, Y. et al. On Detecting Spatial Outliers. Geoinformatica 12, 455–475 (2008). https://doi.org/10.1007/s10707-007-0038-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10707-007-0038-8