skip to main content
10.1145/2371316.2371366acmotherconferencesArticle/Chapter ViewAbstractPublication PagesbciConference Proceedingsconference-collections
short-paper

Reservoir sampling techniques in modern data analysis

Published:16 September 2012Publication History

ABSTRACT

Reservoir sampling is an interesting statistical sampling technique, developed almost 40 years ago in order to enable analysis of large scale data (for that time) while utilizing limited computer memory resources. We present an overview of frequently used reservoir sampling techniques and discuss how they can be used for learning from data streams. While they are not perfect for all scenarios, they can easily be modified for many purpose, and also find place in surprisingly useful modern data analysis approaches.

References

  1. C. C. Aggarwal. On biased reservoir sampling in the presence of stream evolution. In Proc. VLDB, pages 607--618, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. S. Vitter, Random Sampling with a Reservoir. Brown University, 1985.Google ScholarGoogle Scholar
  3. N. Littlestone, Learning Quickly When Irrelevant Attributes Abound: A New Linear-threshold Algorithm. University of California, 1988Google ScholarGoogle Scholar
  4. B. Babcock, M. Datar, and R. Motwani. Sampling from a moving window over streaming data. In Proc. SODA, pages 633--634, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Zhao, R. Jing, Online AUC Maximization, Proc. ICML 2011,Google ScholarGoogle Scholar
  6. R. Kessl, Parallel algorithms for mining of frequent itemsets., PhD Thesis, Faculty of Electrical Engineering, Czech Technical University in Prague, 2011.Google ScholarGoogle Scholar
  7. Hanley, James A. and McNeil, Barbara J. The meaning and use of the area under of receiver operating characteristic (ROC) curve. 1982.Google ScholarGoogle Scholar

Index Terms

  1. Reservoir sampling techniques in modern data analysis

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      BCI '12: Proceedings of the Fifth Balkan Conference in Informatics
      September 2012
      312 pages
      ISBN:9781450312400
      DOI:10.1145/2371316

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 September 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper

      Acceptance Rates

      Overall Acceptance Rate97of250submissions,39%
    • Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)0

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader