Skip to main content

Sequential Sampling Algorithms: Unified Analysis and Lower Bounds

  • Conference paper
  • First Online:
Book cover Stochastic Algorithms: Foundations and Applications (SAGA 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2264))

Included in the following conference series:

Abstract

Sequential sampling algorithms have recently attracted interest as a way to design scalable algorithms for Data mining and KDD processes. In this paper, we identify an elementary sequential Sampling task (estimation from examples), from which one can derive many other tasks appearing in practice. We present a generic algorithm to solve this task and an analysis of its correctness and running time that is simpler and more intuitive than those existing in the literature. For two specific tasks, frequency and advantage estimation, we derive lower bounds on running time in addition to the general upper bounds.

Partially supported by EU EP27150 (Neurocolt II), by the IST Programme of the EU under contract number IST-1999-14186 (ALCOMFT), by CIRIT 1997SGR-00366, TIC2000-1970-CE, and by the Spanish Government PB98-0937-C04-04 (project FRESCO).

Supported in part by Grant-in-Aids for Scientific Research on Priority Areas “Discovery Science” (1998-2000) and for Scientific Research (C) (2001-2002) from the Ministry of Education, Science, Sports and Culture of Japan.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. H. Cherno., A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations, Annals of Mathematical Statistics 23, pp.493–509, 1952.

    Article  MathSciNet  Google Scholar 

  2. P. Dagum, R. Karp, M. Luby, and S. Ross, An optimal algorithm for monte carlo estimation, SIAM J. Comput. Vol. 29(5), pp.1484–1496, 2000.

    Article  MATH  MathSciNet  Google Scholar 

  3. C. Domingo and O. Watanabe, Scaling up a boosting-based learner via adaptive sampling, in Proc. of Knowledge Discovery and Data Mining (PAKDD’00), Lecture Notes in Artificial Intelligence 1805, Springer-Verlag, pp.317–328, 2000.

    Google Scholar 

  4. C. Domingo, R. Gavaldà, and O. Watanabe, Practical algorithms for on-line selection, in Proc. of the First Intl. Conference on Discovery Science, Lecture Notes in Artificial Intelligence 1532, Springer-Verlag, pp.150–161, 1998.

    Google Scholar 

  5. C. Domingo, R. Gavaldà, and O. Watanabe, Adaptive samplingmetho ds for scaling up knowledge discovery algorithms, in Proc. of the Second Intl. Conference on Discovery Science, Lecture Notes in Artificial Intelligence, Springer-Verlag, pp.172–183, 1999. The final version will appear in J. Knowledge Discovery and Data Mining and is also available as research report C-136, Dept. of Math. and Computing Sciences, Tokyo Institute of Technology, from http://www.is.titech.ac.jp/research/research-report/C/.

  6. P. Domingos and G. Hulten, Mining high-speed data streams, in Proc. 6th Intl. Conference on Knowledge Discovery in Databases, ACM Press, pp.71–80, 2000.

    Google Scholar 

  7. P. Domingos and G. Hulten, A general method for scaling up machine learning algorithms and its applications to clustering, in Proc. 8th Intl. Conference on Machine Learning, Morgan Kaufmann, pp.106–113, 2001.

    Google Scholar 

  8. W. Feller, An Introduction to Probability Theory and its Applications (Third Edition), John Wiley & Sons, 1968.

    Google Scholar 

  9. B.K. Ghosh, M. Mukhopadhyay, P.K. Sen, Sequential Estimation, Wiley, 1997.

    Google Scholar 

  10. P. Haas and A. Swami, Sequential sampling, procedures for query size estimation, IBM Research Report, RJ 9101 (80915), 1992.

    Google Scholar 

  11. W. Hoeffding, Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association 58, pp.13–30, 1963.

    Article  MATH  MathSciNet  Google Scholar 

  12. G.H. John and P. Langley, Static versus dynamic sampling for data mining, in Proc. of the Second Intl. Conference on Knowledge Discovery and Data Mining, AAAI/MIT Press, pp.367–370, 1996.

    Google Scholar 

  13. J. Kivinen and H. Mannila, The power of samplingin knowledge discovery, in Proc. of the 14th ACM SIGACT-SIGMOD-SIGACT Symposium on Principles of Database Systems (PODS’94), ACM Press, pp.77–85, 1994.

    Google Scholar 

  14. R.J. Lipton, J.F. Naughton, D.A. Schneider, and S. Seshadri, Efficient sampling strategies for relational database operations, Theoretical Computer Science 116, pp.195–226, 1993.

    Article  MATH  MathSciNet  Google Scholar 

  15. R.J. Lipton and J.F. Naughton, Query size estimation by adaptive sampling, Journal of Computer and System Science 51, pp.18–25, 1995.

    Article  MATH  MathSciNet  Google Scholar 

  16. J.F. Lynch, Analysis and application of adaptive sampling, in Proc. of the 19th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’99), ACM Press, pp.260–267, 1999.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gavaldà, R., Watanabe, O. (2001). Sequential Sampling Algorithms: Unified Analysis and Lower Bounds. In: Steinhöfel, K. (eds) Stochastic Algorithms: Foundations and Applications. SAGA 2001. Lecture Notes in Computer Science, vol 2264. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45322-9_12

Download citation

  • DOI: https://doi.org/10.1007/3-540-45322-9_12

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43025-4

  • Online ISBN: 978-3-540-45322-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics