Computational Aspects of Data Mining

Mărginean, Flaviu Adrian

doi:10.1007/3-540-44839-X_65

Flaviu Adrian Mărginean¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2667))

Included in the following conference series:

International Conference on Computational Science and Its Applications

767 Accesses
1 Citations

Abstract

The last decade has witnessed an impressive growth of Data Mining through algorithms and applications. Despite the advances, a computational theory of Data Mining is still largely outstanding. This paper discusses some aspects relevant to computation in Data Mining from the point of view of the Machine Learning theoretician. Computational techniques used in other fields that deal with learning from data, such as Statistics and Machine Learning, are potentially very relevant. However, the specifics of Data Mining are such that most often those techniques are not directly applicable but require to be re-cast and re-analysed within Data Mining starting from first principles. We illustrate this with a PAC-learnability analysis for a Data Mining-like task. We show that accounting for Data Mining specific requirements, such as inference of weak predictors and agnosticity assumptions, requires the generalisation of the classical PAC framework in novel ways.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Verkamo. Fast Discovery of Association Rules. In U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advance in Knowledge Discovery and Data Mining, pages 307–328. AAAI Press/MIT Press, 1996.
Google Scholar
M. Anthony and N. Biggs. Computational Learning Theory. Cambridge University Press, 1997.
Google Scholar
L. De Raedt, M. Jaeger, S. Lee, and H. Mannila. A Theory of Inductive Query Answering. In Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02), Maebashi, Japan, December 9–12 2002. Extended abstract.
Google Scholar
D. Gunopulos, H. Mannila, R. Khardon, and H. Toivonen. Data Mining, Hypergraph Transversals, and Machine Learning (extended abstract). In Proceedings of the 16 ^th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 209–216, Tucson, Arizona, USA, 1997. ACM Press.
Google Scholar
D.J. Hand. Statistics and Data Mining: Intersecting Disciplines. ACM SIGKDD Explorations, 1(1): 16–19, 1999.
Article Google Scholar
D. Hand, H. Mannila, and P. Smyth. Principles of Data Mining. MIT Press, 2001.
Google Scholar
D. Haussler. Quantifying inductive bias: AI learning algorithms and Valiant’s learning framework. Artificial intelligence, 36:177–221, 1988.
Article MATH MathSciNet Google Scholar
D. Haussler, S. Ben-David, N. Cesa-Bianchi, and P. Long. Characterizations of Learnability for Classes of 0,...,n-valued Functions. J. Comp. Sys. Sci., 50(1):74–86, 1995.
Article MATH MathSciNet Google Scholar
H. Hirsh. Incremental Version Space Merging: A General Framework for Concept Learning. Kluwer, 1990.
Google Scholar
Michael J. Kearns and Umesh V. Vazirani. An Introduction to Computational Learning Theory. The MIT Press, 1994.
Google Scholar
N. Lavrač, D. Gamberger, and V. Jovanoski. A Study of Relevance for Learning in Deductive Databases. Journal of Logic Programming, 40(2/3):215–249, 1999.
Article MATH MathSciNet Google Scholar
H. Mannila. Data mining: machine learning, statistics, and databases. In Proceedings of the Eighth International Conference on Scientific and Statistical Database Management, pages 1–8, Stockholm, June 18–20 1996.
Google Scholar
H. Mannila. Local and Global Methods in Data Mining: Basic Techniques and Open Problems. In Proceedings of ICALP 2002, 29 ^th International Colloquium on Automata, Languages, and Programming, Malaga, Spain, July 2002. Springer.
Google Scholar
H. Mannila. Theoretical Frameworks for Data Mining. ACM SIGKDD Explorations, 1(2):30–32, January 2000.
Article Google Scholar
H. Mannila and H. Toivonen. Levelwise Search and Borders of Theories in Knowledge Discovery. Data Mining and Knowledge Discovery, 1(3):241–258, 1997.
Article Google Scholar
T.M. Mitchell. Version Spaces: An Approach to Concept Learning. PhD thesis, Electrical Engineering Department, Stanford University, 1979.
Google Scholar
T.M. Mitchell. Machine Learning. McGraw-Hill, New York, 1997.
MATH Google Scholar
L. Pitt and L. G. Valiant. Computational limitations on learning from examples. Journal of the ACM, 35(4):965–984, 1988.
Article MATH MathSciNet Google Scholar
M. Sebag. Delaying the Choice of Bias: A Disjunctive Version Space Approach. In R. Bajcsy, editor. Proceedings of the 13 ^th International Conference on Machine Learning, IJCAI 1993. Morgan-Kaufmann, August–September 1993.
Google Scholar
M. Sebag. 2^nd Order Understandability of Disjunctive Version Spaces. In Y. Kodratoff and C. Nédellec, editors, 14 ^th International Joint Conference on Artificial Intelligence (IJCAI-95)—Workshop on Machine Learning and Comprehensibility. Morgan-Kaufmann, August 1995.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, The University of York Heslington, York, YO10 5DD, UK
Flaviu Adrian Mărginean

Authors

Flaviu Adrian Mărginean
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Army High Performance Computing Research Center, USA
Vipin Kumar
Department of Computer Science, University of Calgary, Calgary, AB, T2N1N4, Canada
Marina L. Gavrilova
Heuchera Technologies Inc., 122 9251-8 Yonge Street, Richmond Hill, ON, Canada, L4C 9T3
Chih Jeng Kenneth Tan
Département d’informatique et de recherche opérationelle, Université de Montréal, Montréal, Québec, H3C 3J7, Canada
Pierre L’Ecuyer
Department of Computer Science and Engineering, University of Minessota, MN, 55455, USA
Vipin Kumar
The Queen’s University of Belfast, School of Computer Science, Belfast BT7 1NN, Northern Ireland, UK
Chih Jeng Kenneth Tan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mărginean, F.A. (2003). Computational Aspects of Data Mining. In: Kumar, V., Gavrilova, M.L., Tan, C.J.K., L’Ecuyer, P. (eds) Computational Science and Its Applications — ICCSA 2003. ICCSA 2003. Lecture Notes in Computer Science, vol 2667. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44839-X_65

Download citation

DOI: https://doi.org/10.1007/3-540-44839-X_65
Published: 18 June 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40155-1
Online ISBN: 978-3-540-44839-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics