Abstract
Spatial outlier detection is an important research problem that has received much attentions in recent years. Most existing approaches are designed for numerical attributes, but are not applicable to categorical ones (e.g., binary, ordinal, and nominal) that are popular in many applications. The main challenges are the modeling of spatial categorical dependency as well as the computational efficiency. This paper presents the first outlier detection framework for spatial categorical data. Specifically, a new metric, named as Pair Correlation Ratio (PCR), is measured for each pair of category sets based on their co-occurrence frequencies at specific spatial distance ranges. The relevances among spatial objects are then calculated using PCR values with regard to their spatial distances. The outlierness for each object is defined as the inverse of the average relevance between an object and its spatial neighbors. Those objects with the highest outlier scores are returned as spatial categorical outliers. A set of algorithms are further designed for single-attribute and multi-attribute spatial categorical datasets. Extensive experimental evaluations on both simulated and real datasets demonstrated the effectiveness and efficiency of our proposed approaches.
Similar content being viewed by others
References
Adam NR, Janeja VP, Atluri V (2004) Neighborhood based detection of anomalies in high dimensional spatio-temporal sensor datasets. In: Proceedings of the 2004 ACM symposium on applied computing, pp 576–583
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases, VLDB ’94, pp 487–499
Anselin L (1995) Local indicators of spatial association-lisa. Geogr Anal 27(2):93–115
Aurenhammer F (1991) Voronoi diagrams: a survey of a fundamental geometric data structure. ACM Comput Surv 23(3):345–405
Bel L, Allard D, Laurent JM, Cheddadi R, Bar-Hen A (2009) Cart algorithm for spatial data: application to environmental and ecological data. Comput Statist Data Anal 53(8):3082–3093
Berchtold S, Ertl B, Keim DA, Kriegel HP, Seidl T (1998) Fast nearest neighbor search in high-dimensional space. In: Proceedings of the 14th international conference on data engineering, pp 209–218
Boriah S, Chandola V, Kumar V (2008) Similarity measures for categorical data: a comparative evaluation. In: SDM, pp 243–254
Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) Lof: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, pp 93–104
Bronstein R, Das J, Duro M, Friedrich R, Kleyner G, Mueller M, Singhal S, Cohen I, Kleyner G, Mueller M, Singhal S, Cohen I (2001) Self-aware services: using bayesian networks for detecting anomalies in internet-based services. Northwestern University and Stanford University, pp 623–638
Chan PK, Mahoney MV, Arshad MH (2003) A machine learning approach to anomaly detection. Technical Report
Chandola V, Boriah S, Kuman V (2008) Understanding categorical similarity measures for outlier detection. Technical report, University of Minnesota
Chen D, Lu C-T, Kou Y, Chen F (2008) On detecting spatial outliers. Geoinformatica 12(4):455–475
Chen F, Lu C-T, Boedihardjo AP (2010) Gls-sod: a generalized local statistical approach for spatial outlier detection. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1069–1078
Das K, Schneider J (2007) Detecting anomalous records in categorical datasets. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’07, pp 220–229
Ferhatosmanoglu H, Tuncel E, Agrawal D, Abbadi AE (2001) Approximate nearest neighbor searching in multimedia databases. In: Proceedings of the 17th international conference on data engineering. IEEE Computer Society, 2–6 Apr 2001. Heidelberg, Germany, pp 503–511
Goovaerts P (1997) Geostatistics for natural resources evaluation. Applied geostatistics series, Oxford University Press
Grekousis G, Fotis YN (2012) A fuzzy index for detecting spatiotemporal outliers. Geoinformatica 16(3):597–619
Haining R (1990) Spatial data analysis in the social and environmental sciences. Cambridge University Press
He Z, Deng S, Xu X, Huang JZ (2006) A fast greedy algorithm for outlier mining. In: Proceedings of the 10th Pacific–Asia conference on knowledge and data discovery, pp 567–576
He Z, Xu X, Deng S (2005) An optimization model for outlier detection in categorical data. CoRR, abs/cs/0503081
He Z, Xu X, Huang JZ, Deng S (2004) A frequent pattern discovery method for outlier detection. In: WAIM, pp 726–732
He Z, Xu X, Huang JZ, Deng S (2005) Fp-outlier: frequent pattern based outlier detection. Comput Sci Inf Syst 2(1):103–118
Hjaltason GR, Samet H (1998) Incremental distance join algorithms for spatial databases. In: SIGMOD conference, pp 237–248
Huang Y, Pei J, Xiong H (2006) Mining co-location patterns with rare events from spatial data sets. Geoinformatica 10(3):239–260
Huang Y, Shekhar S, Xiong H (2004) Discovering colocation patterns from spatial data sets: a general approach. IEEE Trans Knowl Data Eng 16(12):1472–1485
Illian J, Penttinen A, Stoyan H, Stoyan D (2008) Statistical analysis and modelling of spatial point patterns. Int Stat Rev 76:458
Koperski K, Han J (1995) Discovery of spatial association rules in geographic information databases. Analysis 6:47–66
Kou Y, Lu C-T, Santos RFD (2007) Spatial outlier detection: a graph-based approach. In: 19th IEEE international conference on tools with artificial intelligence, ICTAI ’07, Patras, Greece, pp 281–288
Koufakou A, Ortiz EG, Georgiopoulos M, Anagnostopoulos GC, Reynolds KM (2007) A scalable and efficient outlier detection strategy for categorical data. In: Proceedings of the 19th IEEE international conference on tools with artificial intelligence, vol 02, ICTAI ’07, pp 210–217
Koufakou A, Secretan J, Reeder J, Cardona K, Georgiopoulos M (2008) Fast parallel outlier detection for categorical datasets using mapreduce. In: IEEE world congress on computational intelligence (WCCI)
Liu X, Lu C-T, Chen F (2010) Spatial outlier detection: random walk based approaches. In: ACM SIGGIS, pp 370–379
Lu C-T, Chen D, Kou Y (2003) Algorithms for spatial outlier detection. In: ICDM, pp 597–600
Lu C-T, Chen D, Kou Y (2003) Detecting spatial outliers with multiple attributes. In: ICTAI, pp 122–128
Mingming NY (2000) Probabilistic networks with undirected links for anomaly detection. In: Proceedings of IEEE systems, man, and cybernetics information assurance and security workshop, pp 175–179
Otey ME, Ghoting A, Parthasarathy S (2006) Fast distributed outlier detection in mixed-attribute data sets. Data Min Knowl Discov 12:203–228
Pelleg D (2004) Scalable and practical probability density estimators for scientific anomaly detection. PhD thesis, Carnegie Mellon University
Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, pp 427–438
Reed T, Gubbins K (1973) Applied statistical mechanics: thermodynamic and transport properties of fluids. Butterworth-Heinemann reprint series in chemical engineering. Butterworth-Heinemann
Shekhar S, Chawla S (2003) Spatial databases—a tour. Prentice Hall
Shekhar S, Chawla S, Ravada S, Fetterer A, Liu X, Lu CT (1999) Spatial databases: accomplishments and research needs. IEEE Trans Knowl Data Eng 11:45–55
Shekhar S, Huang Y (2001) Discovering spatial co-location patterns: a summary of results. In: Proceedings of the 7th international symposium on advances in spatial and temporal databases, SSTD ’01. Springer, London, pp 236–256
Shekhar S, Lu C-T, Zhang P (2001) Detecting graph-based spatial outliers: algorithms and applications (a summary of results). In: KDD, pp 371–376
Shekhar S, Lu C-T, Zhang P, Shekhar S, Lu CT, Zhang P (2003) A unified approach to spatial outliers detection. GeoInformatica 7:139–166
Stanoi I, Agrawal D, Abbadi AE (2000) Reverse nearest neighbor queries for dynamic databases. In: In ACM SIGMOD workshop on research issues in data mining and knowledge discovery, pp 44–53
Sun P, Chawla S (2004) On local spatial outliers. In: IEEE international conference on data mining, pp 209–216
Tobler WR (1979) Cellular geography, pp 379–389. Reidel, Dordrecht, Netherlands
Wong W-K, Moore A, Cooper G, Wagner M (2002) Rule-based anomaly pattern detection for detecting disease outbreaks. In: Eighteenth national conference on Artificial intelligence, pp 217–223
Yoo JS, Shekhar S (2006) A joinless approach for mining spatial colocation patterns. IEEE Trans Knowl Data Eng 18(10):1323–1337
Zhao J, Lu C-T, Kou Y (2003) Detecting region outliers in meteorological data. In: Proceedings of the 11th ACM international symposium on advances in geographic information systems, pp 49–55
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, X., Chen, F. & Lu, CT. On detecting spatial categorical outliers. Geoinformatica 18, 501–536 (2014). https://doi.org/10.1007/s10707-013-0188-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10707-013-0188-9