Abstract
Sequential sampling algorithms have recently attracted interest as a way to design scalable algorithms for Data mining and KDD processes. In this paper, we identify an elementary sequential Sampling task (estimation from examples), from which one can derive many other tasks appearing in practice. We present a generic algorithm to solve this task and an analysis of its correctness and running time that is simpler and more intuitive than those existing in the literature. For two specific tasks, frequency and advantage estimation, we derive lower bounds on running time in addition to the general upper bounds.
Partially supported by EU EP27150 (Neurocolt II), by the IST Programme of the EU under contract number IST-1999-14186 (ALCOMFT), by CIRIT 1997SGR-00366, TIC2000-1970-CE, and by the Spanish Government PB98-0937-C04-04 (project FRESCO).
Supported in part by Grant-in-Aids for Scientific Research on Priority Areas “Discovery Science” (1998-2000) and for Scientific Research (C) (2001-2002) from the Ministry of Education, Science, Sports and Culture of Japan.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
H. Cherno., A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations, Annals of Mathematical Statistics 23, pp.493–509, 1952.
P. Dagum, R. Karp, M. Luby, and S. Ross, An optimal algorithm for monte carlo estimation, SIAM J. Comput. Vol. 29(5), pp.1484–1496, 2000.
C. Domingo and O. Watanabe, Scaling up a boosting-based learner via adaptive sampling, in Proc. of Knowledge Discovery and Data Mining (PAKDD’00), Lecture Notes in Artificial Intelligence 1805, Springer-Verlag, pp.317–328, 2000.
C. Domingo, R. Gavaldà, and O. Watanabe, Practical algorithms for on-line selection, in Proc. of the First Intl. Conference on Discovery Science, Lecture Notes in Artificial Intelligence 1532, Springer-Verlag, pp.150–161, 1998.
C. Domingo, R. Gavaldà, and O. Watanabe, Adaptive samplingmetho ds for scaling up knowledge discovery algorithms, in Proc. of the Second Intl. Conference on Discovery Science, Lecture Notes in Artificial Intelligence, Springer-Verlag, pp.172–183, 1999. The final version will appear in J. Knowledge Discovery and Data Mining and is also available as research report C-136, Dept. of Math. and Computing Sciences, Tokyo Institute of Technology, from http://www.is.titech.ac.jp/research/research-report/C/.
P. Domingos and G. Hulten, Mining high-speed data streams, in Proc. 6th Intl. Conference on Knowledge Discovery in Databases, ACM Press, pp.71–80, 2000.
P. Domingos and G. Hulten, A general method for scaling up machine learning algorithms and its applications to clustering, in Proc. 8th Intl. Conference on Machine Learning, Morgan Kaufmann, pp.106–113, 2001.
W. Feller, An Introduction to Probability Theory and its Applications (Third Edition), John Wiley & Sons, 1968.
B.K. Ghosh, M. Mukhopadhyay, P.K. Sen, Sequential Estimation, Wiley, 1997.
P. Haas and A. Swami, Sequential sampling, procedures for query size estimation, IBM Research Report, RJ 9101 (80915), 1992.
W. Hoeffding, Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association 58, pp.13–30, 1963.
G.H. John and P. Langley, Static versus dynamic sampling for data mining, in Proc. of the Second Intl. Conference on Knowledge Discovery and Data Mining, AAAI/MIT Press, pp.367–370, 1996.
J. Kivinen and H. Mannila, The power of samplingin knowledge discovery, in Proc. of the 14th ACM SIGACT-SIGMOD-SIGACT Symposium on Principles of Database Systems (PODS’94), ACM Press, pp.77–85, 1994.
R.J. Lipton, J.F. Naughton, D.A. Schneider, and S. Seshadri, Efficient sampling strategies for relational database operations, Theoretical Computer Science 116, pp.195–226, 1993.
R.J. Lipton and J.F. Naughton, Query size estimation by adaptive sampling, Journal of Computer and System Science 51, pp.18–25, 1995.
J.F. Lynch, Analysis and application of adaptive sampling, in Proc. of the 19th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’99), ACM Press, pp.260–267, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gavaldà, R., Watanabe, O. (2001). Sequential Sampling Algorithms: Unified Analysis and Lower Bounds. In: Steinhöfel, K. (eds) Stochastic Algorithms: Foundations and Applications. SAGA 2001. Lecture Notes in Computer Science, vol 2264. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45322-9_12
Download citation
DOI: https://doi.org/10.1007/3-540-45322-9_12
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43025-4
Online ISBN: 978-3-540-45322-2
eBook Packages: Springer Book Archive