Sequential Sampling Algorithms: Unified Analysis and Lower Bounds

Gavaldà, Ricard; Watanabe, Osamu

doi:10.1007/3-540-45322-9_12

Ricard Gavaldà⁵ &
Osamu Watanabe⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2264))

Included in the following conference series:

International Symposium on Stochastic Algorithms

390 Accesses
2 Citations

Abstract

Sequential sampling algorithms have recently attracted interest as a way to design scalable algorithms for Data mining and KDD processes. In this paper, we identify an elementary sequential Sampling task (estimation from examples), from which one can derive many other tasks appearing in practice. We present a generic algorithm to solve this task and an analysis of its correctness and running time that is simpler and more intuitive than those existing in the literature. For two specific tasks, frequency and advantage estimation, we derive lower bounds on running time in addition to the general upper bounds.

Partially supported by EU EP27150 (Neurocolt II), by the IST Programme of the EU under contract number IST-1999-14186 (ALCOMFT), by CIRIT 1997SGR-00366, TIC2000-1970-CE, and by the Spanish Government PB98-0937-C04-04 (project FRESCO).

Supported in part by Grant-in-Aids for Scientific Research on Priority Areas “Discovery Science” (1998-2000) and for Scientific Research (C) (2001-2002) from the Ministry of Education, Science, Sports and Culture of Japan.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

H. Cherno., A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations, Annals of Mathematical Statistics 23, pp.493–509, 1952.
Article MathSciNet Google Scholar
P. Dagum, R. Karp, M. Luby, and S. Ross, An optimal algorithm for monte carlo estimation, SIAM J. Comput. Vol. 29(5), pp.1484–1496, 2000.
Article MATH MathSciNet Google Scholar
C. Domingo and O. Watanabe, Scaling up a boosting-based learner via adaptive sampling, in Proc. of Knowledge Discovery and Data Mining (PAKDD’00), Lecture Notes in Artificial Intelligence 1805, Springer-Verlag, pp.317–328, 2000.
Google Scholar
C. Domingo, R. Gavaldà, and O. Watanabe, Practical algorithms for on-line selection, in Proc. of the First Intl. Conference on Discovery Science, Lecture Notes in Artificial Intelligence 1532, Springer-Verlag, pp.150–161, 1998.
Google Scholar
C. Domingo, R. Gavaldà, and O. Watanabe, Adaptive samplingmetho ds for scaling up knowledge discovery algorithms, in Proc. of the Second Intl. Conference on Discovery Science, Lecture Notes in Artificial Intelligence, Springer-Verlag, pp.172–183, 1999. The final version will appear in J. Knowledge Discovery and Data Mining and is also available as research report C-136, Dept. of Math. and Computing Sciences, Tokyo Institute of Technology, from http://www.is.titech.ac.jp/research/research-report/C/.
P. Domingos and G. Hulten, Mining high-speed data streams, in Proc. 6th Intl. Conference on Knowledge Discovery in Databases, ACM Press, pp.71–80, 2000.
Google Scholar
P. Domingos and G. Hulten, A general method for scaling up machine learning algorithms and its applications to clustering, in Proc. 8th Intl. Conference on Machine Learning, Morgan Kaufmann, pp.106–113, 2001.
Google Scholar
W. Feller, An Introduction to Probability Theory and its Applications (Third Edition), John Wiley & Sons, 1968.
Google Scholar
B.K. Ghosh, M. Mukhopadhyay, P.K. Sen, Sequential Estimation, Wiley, 1997.
Google Scholar
P. Haas and A. Swami, Sequential sampling, procedures for query size estimation, IBM Research Report, RJ 9101 (80915), 1992.
Google Scholar
W. Hoeffding, Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association 58, pp.13–30, 1963.
Article MATH MathSciNet Google Scholar
G.H. John and P. Langley, Static versus dynamic sampling for data mining, in Proc. of the Second Intl. Conference on Knowledge Discovery and Data Mining, AAAI/MIT Press, pp.367–370, 1996.
Google Scholar
J. Kivinen and H. Mannila, The power of samplingin knowledge discovery, in Proc. of the 14th ACM SIGACT-SIGMOD-SIGACT Symposium on Principles of Database Systems (PODS’94), ACM Press, pp.77–85, 1994.
Google Scholar
R.J. Lipton, J.F. Naughton, D.A. Schneider, and S. Seshadri, Efficient sampling strategies for relational database operations, Theoretical Computer Science 116, pp.195–226, 1993.
Article MATH MathSciNet Google Scholar
R.J. Lipton and J.F. Naughton, Query size estimation by adaptive sampling, Journal of Computer and System Science 51, pp.18–25, 1995.
Article MATH MathSciNet Google Scholar
J.F. Lynch, Analysis and application of adaptive sampling, in Proc. of the 19th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’99), ACM Press, pp.260–267, 1999.
Google Scholar

Download references

Author information

Authors and Affiliations

Universitat Politècnica de Catalunya, Barcelona, Spain
Ricard Gavaldà
Tokyo Institute of Technology, Tokyo, Japan
Osamu Watanabe

Authors

Ricard Gavaldà
View author publications
You can also search for this author in PubMed Google Scholar
Osamu Watanabe
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute for Computer Architecture and Software Engineering, GMD - National ResearchCenter for Information Technology, Kekuléstr.7, 12489, Berlin-Adlershof, Germany
Kathleen Steinhöfel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gavaldà, R., Watanabe, O. (2001). Sequential Sampling Algorithms: Unified Analysis and Lower Bounds. In: Steinhöfel, K. (eds) Stochastic Algorithms: Foundations and Applications. SAGA 2001. Lecture Notes in Computer Science, vol 2264. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45322-9_12

Download citation

DOI: https://doi.org/10.1007/3-540-45322-9_12
Published: 08 February 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43025-4
Online ISBN: 978-3-540-45322-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics