Abstract
The last decade has witnessed an impressive growth of Data Mining through algorithms and applications. Despite the advances, a computational theory of Data Mining is still largely outstanding. This paper discusses some aspects relevant to computation in Data Mining from the point of view of the Machine Learning theoretician. Computational techniques used in other fields that deal with learning from data, such as Statistics and Machine Learning, are potentially very relevant. However, the specifics of Data Mining are such that most often those techniques are not directly applicable but require to be re-cast and re-analysed within Data Mining starting from first principles. We illustrate this with a PAC-learnability analysis for a Data Mining-like task. We show that accounting for Data Mining specific requirements, such as inference of weak predictors and agnosticity assumptions, requires the generalisation of the classical PAC framework in novel ways.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Verkamo. Fast Discovery of Association Rules. In U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advance in Knowledge Discovery and Data Mining, pages 307–328. AAAI Press/MIT Press, 1996.
M. Anthony and N. Biggs. Computational Learning Theory. Cambridge University Press, 1997.
L. De Raedt, M. Jaeger, S. Lee, and H. Mannila. A Theory of Inductive Query Answering. In Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02), Maebashi, Japan, December 9–12 2002. Extended abstract.
D. Gunopulos, H. Mannila, R. Khardon, and H. Toivonen. Data Mining, Hypergraph Transversals, and Machine Learning (extended abstract). In Proceedings of the 16 th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 209–216, Tucson, Arizona, USA, 1997. ACM Press.
D.J. Hand. Statistics and Data Mining: Intersecting Disciplines. ACM SIGKDD Explorations, 1(1): 16–19, 1999.
D. Hand, H. Mannila, and P. Smyth. Principles of Data Mining. MIT Press, 2001.
D. Haussler. Quantifying inductive bias: AI learning algorithms and Valiant’s learning framework. Artificial intelligence, 36:177–221, 1988.
D. Haussler, S. Ben-David, N. Cesa-Bianchi, and P. Long. Characterizations of Learnability for Classes of 0,...,n-valued Functions. J. Comp. Sys. Sci., 50(1):74–86, 1995.
H. Hirsh. Incremental Version Space Merging: A General Framework for Concept Learning. Kluwer, 1990.
Michael J. Kearns and Umesh V. Vazirani. An Introduction to Computational Learning Theory. The MIT Press, 1994.
N. Lavrač, D. Gamberger, and V. Jovanoski. A Study of Relevance for Learning in Deductive Databases. Journal of Logic Programming, 40(2/3):215–249, 1999.
H. Mannila. Data mining: machine learning, statistics, and databases. In Proceedings of the Eighth International Conference on Scientific and Statistical Database Management, pages 1–8, Stockholm, June 18–20 1996.
H. Mannila. Local and Global Methods in Data Mining: Basic Techniques and Open Problems. In Proceedings of ICALP 2002, 29 th International Colloquium on Automata, Languages, and Programming, Malaga, Spain, July 2002. Springer.
H. Mannila. Theoretical Frameworks for Data Mining. ACM SIGKDD Explorations, 1(2):30–32, January 2000.
H. Mannila and H. Toivonen. Levelwise Search and Borders of Theories in Knowledge Discovery. Data Mining and Knowledge Discovery, 1(3):241–258, 1997.
T.M. Mitchell. Version Spaces: An Approach to Concept Learning. PhD thesis, Electrical Engineering Department, Stanford University, 1979.
T.M. Mitchell. Machine Learning. McGraw-Hill, New York, 1997.
L. Pitt and L. G. Valiant. Computational limitations on learning from examples. Journal of the ACM, 35(4):965–984, 1988.
M. Sebag. Delaying the Choice of Bias: A Disjunctive Version Space Approach. In R. Bajcsy, editor. Proceedings of the 13 th International Conference on Machine Learning, IJCAI 1993. Morgan-Kaufmann, August–September 1993.
M. Sebag. 2nd Order Understandability of Disjunctive Version Spaces. In Y. Kodratoff and C. Nédellec, editors, 14 th International Joint Conference on Artificial Intelligence (IJCAI-95)—Workshop on Machine Learning and Comprehensibility. Morgan-Kaufmann, August 1995.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mărginean, F.A. (2003). Computational Aspects of Data Mining. In: Kumar, V., Gavrilova, M.L., Tan, C.J.K., L’Ecuyer, P. (eds) Computational Science and Its Applications — ICCSA 2003. ICCSA 2003. Lecture Notes in Computer Science, vol 2667. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44839-X_65
Download citation
DOI: https://doi.org/10.1007/3-540-44839-X_65
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40155-1
Online ISBN: 978-3-540-44839-6
eBook Packages: Springer Book Archive