Abstract
The focus on scalability to very large datasets has been a distinguishing feature of the KDD endeavour right from the start of the area. In the present stage of its development, the field has begun to seriously approach the issue, and a number of different techniques for scaling up KDD algorithms have emerged. Traditionally, such techniques are concentrating on the search aspects of the problem, employing algorithmic techniques to avoid searching parts of the space or to speed up processing by exploiting properties of the underlying host systems. Such techniques guarantee perfect correctness of solutions, but can never reach sublinear complexity. In contrast, researchers have recently begun to take a fresh and principled look at stochastic sampling techniques which give only an approximate quality guarantee, but can make runtimes almost independent of the size of the database at hand. In the talk, we give an overview of both of these classes of approaches, focusing on individual examples from our own work for more detailed illustrations of how such techniques work. We briefly outline how active learning elements may enhance KDD approaches in the future.
Chapter PDF
Similar content being viewed by others
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wrobel, S. (2001). Scalability, Search, and Sampling: From Smart Algorithms to Active Discovery. In: De Raedt, L., Siebes, A. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 2001. Lecture Notes in Computer Science(), vol 2168. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44794-6_45
Download citation
DOI: https://doi.org/10.1007/3-540-44794-6_45
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42534-2
Online ISBN: 978-3-540-44794-8
eBook Packages: Springer Book Archive