ABSTRACT
Detection of space-time clusters is an important function in various domains (e.g., epidemiology and public health). The pioneering work on the spatial scan statistic is often used as the basis to detect and evaluate such clusters. State-of-the-art systems based on this approach detect clusters with restrictive shapes that cannot model growth and shifts in location over time. We extend these methods significantly by using the flexible square pyramid shape to model such effects. A heuristic search method is developed to detect the most likely clusters using a randomized algorithm in combination with geometric shapes processing. The use of Monte Carlo methods in the original scan statistic formulation is continued in our work to address the multiple hypothesis testing issues. Our method is applied to a real data set on brain cancer occurrences over a 19 year period. The cluster detected by our method shows both growth and movement which could not have been modeled with the simpler cylindrical shapes used earlier. Our general framework can be extended quite easily to handle other flexible shapes for the space-time clusters.
- R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of the ACM-SIGMOD International Conference on Management of Data, pages 94--105, 1998. Google ScholarDigital Library
- L. Duczmal and R. Assuncao. A simulated annealing strategy for the detection of arbitrary shaped spatial clusters. Computational Statistics and Data Analysis, March 2003.Google Scholar
- J. Fleiss. Statistical methods for Rates and Proportions. John Wiley & Sons, 1981.Google Scholar
- J. Glaz and N. Balakrishnan. Scan Statistics and Applications. Birkhauser, 1999. Google ScholarDigital Library
- D. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, 1989. Google ScholarDigital Library
- M. Kulldorff. A spatial scan statistic. Communications in Statistics: Theory and Methods, 26(6):1481--1496, 1997.Google ScholarCross Ref
- M. Kulldorff. Spatial scan statistics: models, calculations, and applications. In Scan Statistics and Applications, edited by Glaz and Balakrishnan, 1999.Google Scholar
- M. Kulldorff, W. Athas, E. Feuer, B. Miller, and C. Key. Evaluating cluster alarms: A space-time scan statistic and brain cancer in Los Alamos. American Journal of Public Health, 88:1377--1380, 1998.Google ScholarCross Ref
- M. Kulldorff and Information Management Services Inc. Satscan v. 3.1: Software for the spatial and space-time scan statistics. Technical report, 2002. http://www.satscan.org/.Google Scholar
- National Cancer Institute. Brain cancer in New Mexico. Technical Report Data set (1973-1991), Division of Cancer Prevention, Biometry Research Group.Google Scholar
- D. Neill and A. Moore. A fast multi-resolution method for detection of significant spatial overdensities. Technical Report Carnegie Mellon CSD Technical Report CMU-CS-03-154 (Abbreviated version to appear in NIPS 2003), Carnegie Mellon University, June 2003.Google Scholar
- P. van Laarhoven and E. Aarts. Simulated Annealing: Theory and Applications. D. Reidel Publishing Company, 1987. Google ScholarDigital Library
- D. Wilson and B. Rudin. Introduction to the IBM Optimization Subroutine Library. IBM Systems Journal, 31(1):4--10, 1992. Google ScholarDigital Library
Index Terms
- On detecting space-time clusters
Recommendations
Detection of emerging space-time clusters
KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data miningWe propose a new class of spatio-temporal cluster detection methods designed for the rapid detection of emerging space-time clusters. We focus on the motivating application of prospective disease surveillance: detecting space-time clusters of disease ...
Detecting Clusters and Outliers for Multi-dimensional Data
MUE '08: Proceedings of the 2008 International Conference on Multimedia and Ubiquitous EngineeringNowadays many data mining algorithms focus on clustering methods. There are also a lot of approaches designed for outlier detection. We observe that, in many situations, clusters and outliers are concepts whose meanings are inseparable to each other, ...
A novel method for selecting initial centroids in K-means clustering algorithm
In data mining, clustering is a method of grouping similar points together. This grouping can be done using partitioning or hierarchical clustering algorithms. K-means is one of the partitioning clustering algorithms which is simple and faster than ...
Comments