skip to main content
10.1145/1141277.1141426acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

Evaluating the intrinsic dimension of evolving data streams

Published:23 April 2006Publication History

ABSTRACT

Data streams are fundamental in several data processing applications involving large amount of data generated continuously as a sequence of events. Frequently, such events are not stored, so the data is analyzed and queried as they arrive and discarded right away. In many applications these events are represented by a predetermined number of numerical attributes. Thus, without loss of generality, we can consider events as elements from a dimensional domain. A sequence of events in a data stream can be characterized by its intrinsic dimension, which in dimensional datasets is usually lower than the embedding dimensionality. As the intrinsic dimension can be used to improve the performance of algorithms handling dimensional data (specially query optimization) measuring it is relevant to improve data streams processing and analysis as well. Moreover, it can also be useful to forecast data behavior. Hence, we present an algorithm able to measure the intrinsic dimension of a data stream on the fly, following its continuously changing behavior. We also present experimental studies, using both real and synthetic data streams, showing that the results on well-understood datasets closely follow what is expected from the known behavior of the data.

References

  1. C. C. Aggarwal. A framework for diagnosing changes in evolving data streams. In Proc. of SIGMOD'03, pages 575--586, San Diego, USA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In Proc. of PODS'02, pages 1--16, Madison, USA, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Barbará and P. Chen. Using self-similarity to cluster large data sets. Data Mining and Knowledge Discovery, 7(2):123--152, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Belussi and C. Faloutsos. Self-spatial join selectivity estimation using fractal concepts. TOIS, 16(2):161--201, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Chakrabarti and C. Faloutsos. F4: large-scale automated forecasting using fractals. In Proc. of CIKM'02, pages 2--9, McLean, EUA, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Faloutsos and I. Kamel. Beyond uniformity and independence: Analysis of R-trees using the concept of fractal dimension. In Proc. of PODS'94, pages 4--13, Minneapolis, USA, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Gama, R. Rocha, and P. Medas. Accurate decision trees for mining high-speed data streams. In Proc. of KDD'03, pages 523--528, Washington, USA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Guha, A. Meyerson, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams: Theory and practice. TKDE, 15(3):515--528, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Kantardzic, P. Sadeghian, and C. Shen. The time diversification monitoring of a stock portfolio: an approach based on the fractal dimension. In Proc. of SAC'04, pages 637--641, Nicosia, Cyprus, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. E. Keogh and T. Folias. The UCR Time Series Data Mining Archive. University of California, Computer Science and Engineering Department, 2002 {http://www.cs.ucr.edu/eamonn/tsdma/index.html}.Google ScholarGoogle Scholar
  11. A. Manjhi, V. Shkapenyuk, K. Dhamdhere, and C. Olston. Finding (recently) frequent items in distributed data streams. In Proc. of ICDE'05, pages 767--778, Tokyo, Japan, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B.-U. Pagel, F. Korn, and C. Faloutsos. Deflating the dimensionality curse using multiple fractal dimensions. In Proc. of ICDE'00, pages 589--598, San Diego, USA, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Schroeder. Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise. W. H. Freeman and Company, 1991.Google ScholarGoogle Scholar
  14. E. P. M. Sousa, A. J. M. Traina, and C. Traina. SID: Calculating the intrinsic dimension of data streams. In Proc. of the II SIGKDD Workshop on Fractals, Power Laws and Other Next Generation Data Mining Tools, pages 18--23, Washington, USA, 2003.Google ScholarGoogle Scholar
  15. C. Traina, A. Traina, L. Wu, and C. Faloutsos. Fast feature selection using fractal dimension. In Proc. of SBBD'00, pages 158--171, João Pessoa, Brazil, 2000.Google ScholarGoogle Scholar

Index Terms

  1. Evaluating the intrinsic dimension of evolving data streams

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SAC '06: Proceedings of the 2006 ACM symposium on Applied computing
        April 2006
        1967 pages
        ISBN:1595931082
        DOI:10.1145/1141277

        Copyright © 2006 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 April 2006

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate1,650of6,669submissions,25%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader