Skip to main content
Log in

Multi-domain anomaly detection in spatial datasets

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

A spatial anomaly captures a phenomenon occurring in a region which is vastly deviant in behavior with respect to the other normal observations. However, in reality this anomaly may impact other phenomena in the region across multiple domains, for example, crime is often linked to other sociopolitical factors or phenomenon such as poverty and education. Similarly, accidents in the region may be linked to other environmental factors such as weather and surface condition. So, finding anomalies across multiple domains is important in various applications. In this paper, we propose an approach for finding such a tangible anomalous window across multiple domains where window refers to the set of contiguous points in space, and since the window is multi-domain, there are several overlapping windows in the same space across domains. Our approach for finding anomalous window across the domains comprises the following steps: (1) single-domain anomaly detection: discovering anomalous window in each domain; (2) association rule mining: discovering relationship between the anomalous windows across domains using association rule mining; and (3) validation: validating the result using (a) Monte Carlo simulation, (b) correlation using lift and (c) ground truth evaluation. In addition, we also provide a probabilistic framework to evaluate the relationships between the spatial nodes as a postprocessing step. Finally, we provide a visualization technique for viewing the multi-domain anomalous window and the probabilistic relationships between the nodes. We provide detailed experimental results and comparisons with other approaches using real-world health ranking [51] and transportation datasets [50] with known ground truth windows. The results show that our approach is effective in finding the anomalies in multiple domains as compared to other approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30
Fig. 31

Similar content being viewed by others

References

  1. Agarwal D, McGregor A, Phillips JM, Venkatasubramanian S, Zhu Z (2006) Spatial scan statistics: approximations and performance study. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (Philadelphia, PA, USA, August 20–23, 2006), KDD ’06. ACM, New York, NY, pp 24–33. doi:10.1145/1150402.1150410

  2. Agrawal R, Imielminski T, Swami A (1993) Mining association rules between sets of items in large databases. In: SIGMOD conference, pp 207–216

  3. Barnett V, Lewis T (1994) Outliers in statistical data. Wiley, New York

    MATH  Google Scholar 

  4. Bonnie DR, Sorensen J, Guest Column (2011) Where you live matters to your health. http://www.news-journalonline.com/opinion/editorials/guest-columns/2010/07/12/where-you-live-matters-to-your-health.html. Last accessed March 2011

  5. Breiger RL (1974) The duality of persons and groups. University of North Carolina Press, Social Forces, Chapel Hill

    Google Scholar 

  6. Chawla S, Sun P (2006) SLOM: a new measure for local spatial outliers. Knowl Inf Syst 9(4):412–429

    Article  Google Scholar 

  7. Computer science-advanced web and network technologies, and applications. Lecture Notes in Computer Science, 2008, vol 4977/2008, pp 99–109. doi:10.1007/978-3-540-89376-9

  8. Das K, Schneider J, Neill DB (2008) Anomaly pattern detection in categorical datasets. In: Proceedings of 14\(^{\rm th}\) ACM SIGKDD 2008. ACM, New York, pp 169–176

  9. de Vries T, Chawla S, Houle ME (2011) Density-preserving projections for large-scale local anomaly detection. Knowl Inf Syst. doi:10.1007/s10115-011-0430-4. Last accessed 9 Dec

  10. Dillard B, Shmueli G (2004) Simultaneous analysis of multiple time series using two-dimensional wavelets. Manuscript 1:1

    Google Scholar 

  11. Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases. In: Proceedings of the 2nd international conference on knowledge discovery and data mining, pp 44–49, USA. AAAI Press, Menlo Park

  12. Everett Martin G, Borgatti Stephen P (1998) Analyzing clique overlap. Connections 21(1):49–61

    Google Scholar 

  13. Han D, Rogerson PA, Nie J, Bonner MR, Vena JE, Vito D, Muti P, Trevisan M, Edge SB, Freudenheim JL (2004) Geographic clustering of residence in early life and subsequent risk of breast cancer (United States). Cancer Causes Control 15(9):921–929

    Article  Google Scholar 

  14. Harel D, Koren Y (2001) Clustering spatial data using random walks. In: Proceedings of the seventh international conference on knowledge discovery and data mining, pp 281–286, ACM Press, New York

  15. Health Statistics, Obesity (most recent) by country. http://www.nationmaster.com/graph/hea_obe-health-obesity. Last accessed March 2011

  16. Hido S, Tsuboi Y, Kashima H, Sugiyama M, Kanamori T (2011) Statistical outlier detection using direct density ratio estimation. Knowl Inf Syst 26(2):309–336

    Article  Google Scholar 

  17. Howe HL, Wingo PA, Thun MJ, Ries LA, Rosenberg HM, Feigal EG, Edwards BK (2001) Annual report to the nation on the status of cancer (1973 through 1998), featuring cancers with recent increasing trends. J Natl Cancer Inst 93(11):824–842

    Article  Google Scholar 

  18. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2732272/. Cancer outlier detection based on likelihood ratio test

  19. Hu J Cancer outlier detection based on likelihood ratio test. http://bioinformatics.oxfordjournals.org/content/24/19/2193.short

  20. Janeja VP, Adam N, Atluri V, Vaidya JS (March 2010) Spatial neighborhood based anomaly detection in sensor datasets. In: Special issue on outlier detection data mining and knowledge discovery, vol 20(2). Springer, Berlin, pp 221–258

  21. Janeja VP, Adam NR, Atluri V, Vaidya J (2010) Spatial neighborhood based anomaly detection in sensor datasets. Data Min Knowl Discov 2:221–258. doi:10.1007/s10618-009-0147-0

    Article  MathSciNet  Google Scholar 

  22. Janeja VP, Atluri V (2008) Random walks to identify anomalous free-form spatial scan windows. In: IEEE TKDE 20(10):1378–1392

  23. Janeja VP, Atluri V, Vaidya JS, Adam N (2005) Collusion set detection through outlier discovery. In: IEEE intelligence and security informatics (IEEE ISI). Atlanta, Georgia

  24. Janet G (ed) (2008) State of the evidence the connection between breast cancer and the environment, 5th edn. Ph.D., published by the Breast Cancer Fund

  25. JGraph (2011) Java graph component for the visualization and layout of graphs. http://www.jgraph.com/. Last accessed 9 Dec 2011

  26. Jiawei H, Micheline K (2006) Data mining: concepts and techniques, 2\(^{\rm nd}\) edn. Morgan Kauffman

  27. Jung I, Kulldorff M, Klassen A (2007) A spatial scan statistic for ordinal data. Stat Med 26:1594–1607

    Google Scholar 

  28. Knorr Edwin M, Ng Raymond T (2000) Distance-based outliers: algorithms and applications. VLDB J 8(3–4):237–253

    Article  Google Scholar 

  29. Kulldorff M (1997) A spatial scan statistic. Commun Stat Theory Methods 26:1481–1496

    Article  MathSciNet  MATH  Google Scholar 

  30. Kulldorff M (1999) Spatial scan statistics: models, calculations and applications. In: Glaz J, Balkrishnan N (eds) Scan statistics and applications, statistics for industry and technology, pp 303–322

  31. Kulldorff M, Athas W, Feuer E, Miller B, Key C (1998) Evaluating cluster alarms: a space-time scan statistic and brain cancer in Los Alamos. Am J Public Health 88(9):1377–1380

    Article  Google Scholar 

  32. Kulldorff M, Nagarwalla N (1995) Spatial disease clusters: detection and inference. Stat Med 14:799–810

    Google Scholar 

  33. Lu C, Chen D, Kou Y (2003) Detecting spatial outliers with multiple attribute. In: Proceedings of ICTAI’03, Proceedings of 15\(^{\rm th}\) IEEE international conference on tools with artificial intelligence, p 122

  34. Multivariate scan statistics for disease surveillance. http://www.dbmi.pitt.edu/panda/papers/Kulldorff/k-M2005.pdf

  35. Naus J (1965) The distribution of the size of the maximum cluster of points on the line. J Am Stat Assoc 60:532–538

    Article  MathSciNet  Google Scholar 

  36. Neill DB, Cooper GF, Das K, Jiang X, Schneider J (2009) Bayesian network scan statistics for multivariate pattern detection. In: Scan statistics: statistics for industry and technology, pp 221–249

  37. Neill DB, Moore AW, Cooper GF A multivariate Bayesian scan statistic

  38. New Jersey accident data for state routes. http://www.state.nj.us/transportation/refdata/accident/ (1999)

  39. Newman MEJ (2008) The mathematics of networks, The New Palgrave encyclopedia of economics

  40. NodeXL (2011) An excel 2007/2010 template for viewing network graphs. http://nodexl.codeplex.com/. Last accessed 9 Dec 2011

  41. Park Y, Priebe CE, Marchette DJ, Youssef A (2009) Anomaly detection using scan statistics on time series hypergraphs, workshop on link analysis, SDM 2009

  42. Patcha A, Park J-M (2007) An overview of anomaly detection techniques: existing Solutions and latest technological trends. Comput Netw 51(12):3448–3470

    Article  Google Scholar 

  43. Rivers RW (2006) Evidence in traffic crash investigation and reconstruction: identification, interpretation and analysis of evidence, and the traffic crash investigation and reconstruction process

  44. Sabyasachi B, Martin M (2007) Automatic outlier detection for time series: an application to sensor data. Knowl Inf Syst 11(2):137–154

    Article  Google Scholar 

  45. Sergey Brin, Lawrence Page (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 33:107–117

    Google Scholar 

  46. Shi L, Janeja VP (2009) Anomalous window discovery through scan statistics for linear intersecting paths (SSLIP). In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (Paris, France, June 28–July 01, 2009), KDD ’09. ACM, New York, NY, pp 767–776. doi:10.1145/1557019.1557104

  47. Shmueli G, Fienberg SE (2004) Current and potential statistical methods for monitoring multiple data streams for bio-surveillance. In: Wilson A, Olwell D (eds) Statistical methods in counter-terrorism

  48. Snyder D (2001) Online intrusion detection using sequences of system calls. M.S. thesis, Department of Computer Science, Florida State University

  49. Sslip:code, datasets and known window reports. http://userpages.umbc.edu/~leishi1/sslip/sslip.htm (2009)

  50. State of New Jersey, Department of Transportation, Crash records, http://www.state.nj.us/transportation/refdata/accident/. Last accessed March 2011

  51. The County Health Rankings, a key component of the mobilizing action toward community health (MATCH) project. http://www.countyhealthrankings.org/. Last accessed March 2011

  52. Wasserman S, Faust K (1994) Social network analysis. Cambridge University Press, Cambridge

    Google Scholar 

  53. WEKA Weka 3: data mining software in Java. http://www.cs.waikato.ac.nz/ml/weka/. Last accessed March (2011)

  54. World Road Association, PIARC Road accident investigation guidelines for road engineers. http://www.who.int/roadsafety/news/piarc_manual.pdf

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vandana P. Janeja.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Janeja, V.P., Palanisamy, R. Multi-domain anomaly detection in spatial datasets. Knowl Inf Syst 36, 749–788 (2013). https://doi.org/10.1007/s10115-012-0534-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-012-0534-5

Keywords

Navigation