Skip to main content
Log in

On detecting spatial categorical outliers

  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

Spatial outlier detection is an important research problem that has received much attentions in recent years. Most existing approaches are designed for numerical attributes, but are not applicable to categorical ones (e.g., binary, ordinal, and nominal) that are popular in many applications. The main challenges are the modeling of spatial categorical dependency as well as the computational efficiency. This paper presents the first outlier detection framework for spatial categorical data. Specifically, a new metric, named as Pair Correlation Ratio (PCR), is measured for each pair of category sets based on their co-occurrence frequencies at specific spatial distance ranges. The relevances among spatial objects are then calculated using PCR values with regard to their spatial distances. The outlierness for each object is defined as the inverse of the average relevance between an object and its spatial neighbors. Those objects with the highest outlier scores are returned as spatial categorical outliers. A set of algorithms are further designed for single-attribute and multi-attribute spatial categorical datasets. Extensive experimental evaluations on both simulated and real datasets demonstrated the effectiveness and efficiency of our proposed approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Adam NR, Janeja VP, Atluri V (2004) Neighborhood based detection of anomalies in high dimensional spatio-temporal sensor datasets. In: Proceedings of the 2004 ACM symposium on applied computing, pp 576–583

  2. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases, VLDB ’94, pp 487–499

  3. Anselin L (1995) Local indicators of spatial association-lisa. Geogr Anal 27(2):93–115

    Article  Google Scholar 

  4. Aurenhammer F (1991) Voronoi diagrams: a survey of a fundamental geometric data structure. ACM Comput Surv 23(3):345–405

    Article  Google Scholar 

  5. Bel L, Allard D, Laurent JM, Cheddadi R, Bar-Hen A (2009) Cart algorithm for spatial data: application to environmental and ecological data. Comput Statist Data Anal 53(8):3082–3093

    Article  Google Scholar 

  6. Berchtold S, Ertl B, Keim DA, Kriegel HP, Seidl T (1998) Fast nearest neighbor search in high-dimensional space. In: Proceedings of the 14th international conference on data engineering, pp 209–218

  7. Boriah S, Chandola V, Kumar V (2008) Similarity measures for categorical data: a comparative evaluation. In: SDM, pp 243–254

  8. Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) Lof: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, pp 93–104

  9. Bronstein R, Das J, Duro M, Friedrich R, Kleyner G, Mueller M, Singhal S, Cohen I, Kleyner G, Mueller M, Singhal S, Cohen I (2001) Self-aware services: using bayesian networks for detecting anomalies in internet-based services. Northwestern University and Stanford University, pp 623–638

  10. Chan PK, Mahoney MV, Arshad MH (2003) A machine learning approach to anomaly detection. Technical Report

  11. Chandola V, Boriah S, Kuman V (2008) Understanding categorical similarity measures for outlier detection. Technical report, University of Minnesota

  12. Chen D, Lu C-T, Kou Y, Chen F (2008) On detecting spatial outliers. Geoinformatica 12(4):455–475

    Google Scholar 

  13. Chen F, Lu C-T, Boedihardjo AP (2010) Gls-sod: a generalized local statistical approach for spatial outlier detection. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1069–1078

  14. Das K, Schneider J (2007) Detecting anomalous records in categorical datasets. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’07, pp 220–229

  15. Ferhatosmanoglu H, Tuncel E, Agrawal D, Abbadi AE (2001) Approximate nearest neighbor searching in multimedia databases. In: Proceedings of the 17th international conference on data engineering. IEEE Computer Society, 2–6 Apr 2001. Heidelberg, Germany, pp 503–511

    Chapter  Google Scholar 

  16. Goovaerts P (1997) Geostatistics for natural resources evaluation. Applied geostatistics series, Oxford University Press

  17. Grekousis G, Fotis YN (2012) A fuzzy index for detecting spatiotemporal outliers. Geoinformatica 16(3):597–619

    Article  Google Scholar 

  18. Haining R (1990) Spatial data analysis in the social and environmental sciences. Cambridge University Press

  19. He Z, Deng S, Xu X, Huang JZ (2006) A fast greedy algorithm for outlier mining. In: Proceedings of the 10th Pacific–Asia conference on knowledge and data discovery, pp 567–576

  20. He Z, Xu X, Deng S (2005) An optimization model for outlier detection in categorical data. CoRR, abs/cs/0503081

  21. He Z, Xu X, Huang JZ, Deng S (2004) A frequent pattern discovery method for outlier detection. In: WAIM, pp 726–732

  22. He Z, Xu X, Huang JZ, Deng S (2005) Fp-outlier: frequent pattern based outlier detection. Comput Sci Inf Syst 2(1):103–118

    Google Scholar 

  23. Hjaltason GR, Samet H (1998) Incremental distance join algorithms for spatial databases. In: SIGMOD conference, pp 237–248

  24. Huang Y, Pei J, Xiong H (2006) Mining co-location patterns with rare events from spatial data sets. Geoinformatica 10(3):239–260

    Google Scholar 

  25. Huang Y, Shekhar S, Xiong H (2004) Discovering colocation patterns from spatial data sets: a general approach. IEEE Trans Knowl Data Eng 16(12):1472–1485

    Article  Google Scholar 

  26. Illian J, Penttinen A, Stoyan H, Stoyan D (2008) Statistical analysis and modelling of spatial point patterns. Int Stat Rev 76:458

    Article  Google Scholar 

  27. Koperski K, Han J (1995) Discovery of spatial association rules in geographic information databases. Analysis 6:47–66

    Google Scholar 

  28. Kou Y, Lu C-T, Santos RFD (2007) Spatial outlier detection: a graph-based approach. In: 19th IEEE international conference on tools with artificial intelligence, ICTAI ’07, Patras, Greece, pp 281–288

  29. Koufakou A, Ortiz EG, Georgiopoulos M, Anagnostopoulos GC, Reynolds KM (2007) A scalable and efficient outlier detection strategy for categorical data. In: Proceedings of the 19th IEEE international conference on tools with artificial intelligence, vol 02, ICTAI ’07, pp 210–217

  30. Koufakou A, Secretan J, Reeder J, Cardona K, Georgiopoulos M (2008) Fast parallel outlier detection for categorical datasets using mapreduce. In: IEEE world congress on computational intelligence (WCCI)

  31. Liu X, Lu C-T, Chen F (2010) Spatial outlier detection: random walk based approaches. In: ACM SIGGIS, pp 370–379

  32. Lu C-T, Chen D, Kou Y (2003) Algorithms for spatial outlier detection. In: ICDM, pp 597–600

  33. Lu C-T, Chen D, Kou Y (2003) Detecting spatial outliers with multiple attributes. In: ICTAI, pp 122–128

  34. Mingming NY (2000) Probabilistic networks with undirected links for anomaly detection. In: Proceedings of IEEE systems, man, and cybernetics information assurance and security workshop, pp 175–179

  35. Otey ME, Ghoting A, Parthasarathy S (2006) Fast distributed outlier detection in mixed-attribute data sets. Data Min Knowl Discov 12:203–228

    Article  Google Scholar 

  36. Pelleg D (2004) Scalable and practical probability density estimators for scientific anomaly detection. PhD thesis, Carnegie Mellon University

  37. Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, pp 427–438

  38. Reed T, Gubbins K (1973) Applied statistical mechanics: thermodynamic and transport properties of fluids. Butterworth-Heinemann reprint series in chemical engineering. Butterworth-Heinemann

  39. Shekhar S, Chawla S (2003) Spatial databases—a tour. Prentice Hall

  40. Shekhar S, Chawla S, Ravada S, Fetterer A, Liu X, Lu CT (1999) Spatial databases: accomplishments and research needs. IEEE Trans Knowl Data Eng 11:45–55

    Article  Google Scholar 

  41. Shekhar S, Huang Y (2001) Discovering spatial co-location patterns: a summary of results. In: Proceedings of the 7th international symposium on advances in spatial and temporal databases, SSTD ’01. Springer, London, pp 236–256

    Google Scholar 

  42. Shekhar S, Lu C-T, Zhang P (2001) Detecting graph-based spatial outliers: algorithms and applications (a summary of results). In: KDD, pp 371–376

  43. Shekhar S, Lu C-T, Zhang P, Shekhar S, Lu CT, Zhang P (2003) A unified approach to spatial outliers detection. GeoInformatica 7:139–166

    Article  Google Scholar 

  44. Stanoi I, Agrawal D, Abbadi AE (2000) Reverse nearest neighbor queries for dynamic databases. In: In ACM SIGMOD workshop on research issues in data mining and knowledge discovery, pp 44–53

  45. Sun P, Chawla S (2004) On local spatial outliers. In: IEEE international conference on data mining, pp 209–216

  46. Tobler WR (1979) Cellular geography, pp 379–389. Reidel, Dordrecht, Netherlands

    Google Scholar 

  47. Wong W-K, Moore A, Cooper G, Wagner M (2002) Rule-based anomaly pattern detection for detecting disease outbreaks. In: Eighteenth national conference on Artificial intelligence, pp 217–223

  48. Yoo JS, Shekhar S (2006) A joinless approach for mining spatial colocation patterns. IEEE Trans Knowl Data Eng 18(10):1323–1337

    Article  Google Scholar 

  49. Zhao J, Lu C-T, Kou Y (2003) Detecting region outliers in meteorological data. In: Proceedings of the 11th ACM international symposium on advances in geographic information systems, pp 49–55

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xutong Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, X., Chen, F. & Lu, CT. On detecting spatial categorical outliers. Geoinformatica 18, 501–536 (2014). https://doi.org/10.1007/s10707-013-0188-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-013-0188-9

Keywords

Navigation