skip to main content
10.1145/1281192.1281264acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Statistical change detection for multi-dimensional data

Published:12 August 2007Publication History

ABSTRACT

This paper deals with detecting change of distribution in multi-dimensional data sets. For a given baseline data set and a set of newly observed data points, we define a statistical test called the density test for deciding if the observed data points are sampled from the underlying distribution that produced the baseline data set. We define a test statistic that is strictly distribution-free under the null hypothesis. Our experimental results show that the density test has substantially more power than the two existing methods for multi-dimensional change detection.

References

  1. D. Agarwal, A. McGregor, J. Phillips, S. Venkatasubramanian, and Z. Zhu. Spatial scan statistics: Approximations and performance study. In SIGKDD, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Agarwal, J. M. Phillips, and S. Venkatasubramanian. The hunting of the bump: On maximizing statistical discrepancy. In SODA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Aggarwal. A framework for change diagnosis of data streams. In SIGMOD, pages 575--586, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Bay and M. Pazzani. Detecting group differences: Mining contrast sets. Data Min. Knowl. Discov., 5(3):213--246, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Bilmes. A gentle tutorial on the em algorithm and its application to parameter estimation for gaussian mixture and hidden markov models, 1997.Google ScholarGoogle Scholar
  6. M. Breunig, H.-P. Kriegel, R. Ng, and J. Sander. Lof: Identifying density-based local outliers. In SIGMOD, pages 93--104, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Comaniciu. An algorithm for data-driven bandwidth selection. IEEE Transactions on PAMI, 25(2): 281--288, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. Dasu, S. Krishnan, S. Venkatasubramanian, and K. Yi. An information-theoretic approach to detecting changes in multi-dimensional data streams. In Interface, 2006.Google ScholarGoogle Scholar
  9. A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. JRSS Series B), 39(1):1--38, 1977.Google ScholarGoogle Scholar
  10. B. Efron and R. J. Tibshirani. An introduction to the Bootstrap, volume 57 of Monographs on Statistics and Applied Probability. Chapman and Hall, 1993.Google ScholarGoogle Scholar
  11. E. Knorr, R. Ng, and V. Tucakov. Distance-based outliers: Algorithms and applications. VLDB J., 8(3--4), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Kulldorff. A spatial scan statistic. Comm. in Statistics: Theory and Methods, 26(6):1481--1496, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  13. J.-F. Maa, D. Pearl, and R. Bartoszynski. Reducing multidimensional two-sample data to one-dimensional interpoint comparisons. The Annals of Statistics, 24(3): 1069--1074, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  14. R. Miller. Simultaneous Statistical Inference. McGraw-Hill, New York, 1966.Google ScholarGoogle Scholar
  15. D. Neill and A. Moore. Rapid detection of significant spatial clusters. In SIGKDD, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. R. Rosenbaum. An exact distribution-free test comparing two multivariate distributions based on adjacency. JRSS Series B), 67(4): 515--530, 2005.Google ScholarGoogle Scholar
  17. D. Scott. Multivariate Density Estimation: Theory, Practice and Visualization. Wiley-Interscience, New York, 1992.Google ScholarGoogle Scholar
  18. S. Sheather and M. Jones. A reliable databased bandwidth selection method for kernel density estimation. JRSS Series B, (53):683--690, 1991.Google ScholarGoogle Scholar
  19. B. W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman and Hall, London, 1986.Google ScholarGoogle ScholarCross RefCross Ref
  20. M. Wand and M. Jones. Kernel Smoothing. Chapman and Hall, 1995.Google ScholarGoogle ScholarCross RefCross Ref
  21. W.-K. Wong, A. Moore, G. Cooper, and M. Wagner. Bayesian network anomaly pattern detection for disease outbreaks. In ICML, pages 808--815, 2003.Google ScholarGoogle Scholar

Index Terms

  1. Statistical change detection for multi-dimensional data

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
        August 2007
        1080 pages
        ISBN:9781595936097
        DOI:10.1145/1281192

        Copyright © 2007 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 August 2007

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        KDD '07 Paper Acceptance Rate111of573submissions,19%Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader