Skip to main content

Computational Aspects of Data Mining

  • Conference paper
  • First Online:
Computational Science and Its Applications — ICCSA 2003 (ICCSA 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2667))

Included in the following conference series:

Abstract

The last decade has witnessed an impressive growth of Data Mining through algorithms and applications. Despite the advances, a computational theory of Data Mining is still largely outstanding. This paper discusses some aspects relevant to computation in Data Mining from the point of view of the Machine Learning theoretician. Computational techniques used in other fields that deal with learning from data, such as Statistics and Machine Learning, are potentially very relevant. However, the specifics of Data Mining are such that most often those techniques are not directly applicable but require to be re-cast and re-analysed within Data Mining starting from first principles. We illustrate this with a PAC-learnability analysis for a Data Mining-like task. We show that accounting for Data Mining specific requirements, such as inference of weak predictors and agnosticity assumptions, requires the generalisation of the classical PAC framework in novel ways.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Verkamo. Fast Discovery of Association Rules. In U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advance in Knowledge Discovery and Data Mining, pages 307–328. AAAI Press/MIT Press, 1996.

    Google Scholar 

  2. M. Anthony and N. Biggs. Computational Learning Theory. Cambridge University Press, 1997.

    Google Scholar 

  3. L. De Raedt, M. Jaeger, S. Lee, and H. Mannila. A Theory of Inductive Query Answering. In Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02), Maebashi, Japan, December 9–12 2002. Extended abstract.

    Google Scholar 

  4. D. Gunopulos, H. Mannila, R. Khardon, and H. Toivonen. Data Mining, Hypergraph Transversals, and Machine Learning (extended abstract). In Proceedings of the 16 th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 209–216, Tucson, Arizona, USA, 1997. ACM Press.

    Google Scholar 

  5. D.J. Hand. Statistics and Data Mining: Intersecting Disciplines. ACM SIGKDD Explorations, 1(1): 16–19, 1999.

    Article  Google Scholar 

  6. D. Hand, H. Mannila, and P. Smyth. Principles of Data Mining. MIT Press, 2001.

    Google Scholar 

  7. D. Haussler. Quantifying inductive bias: AI learning algorithms and Valiant’s learning framework. Artificial intelligence, 36:177–221, 1988.

    Article  MATH  MathSciNet  Google Scholar 

  8. D. Haussler, S. Ben-David, N. Cesa-Bianchi, and P. Long. Characterizations of Learnability for Classes of 0,...,n-valued Functions. J. Comp. Sys. Sci., 50(1):74–86, 1995.

    Article  MATH  MathSciNet  Google Scholar 

  9. H. Hirsh. Incremental Version Space Merging: A General Framework for Concept Learning. Kluwer, 1990.

    Google Scholar 

  10. Michael J. Kearns and Umesh V. Vazirani. An Introduction to Computational Learning Theory. The MIT Press, 1994.

    Google Scholar 

  11. N. Lavrač, D. Gamberger, and V. Jovanoski. A Study of Relevance for Learning in Deductive Databases. Journal of Logic Programming, 40(2/3):215–249, 1999.

    Article  MATH  MathSciNet  Google Scholar 

  12. H. Mannila. Data mining: machine learning, statistics, and databases. In Proceedings of the Eighth International Conference on Scientific and Statistical Database Management, pages 1–8, Stockholm, June 18–20 1996.

    Google Scholar 

  13. H. Mannila. Local and Global Methods in Data Mining: Basic Techniques and Open Problems. In Proceedings of ICALP 2002, 29 th International Colloquium on Automata, Languages, and Programming, Malaga, Spain, July 2002. Springer.

    Google Scholar 

  14. H. Mannila. Theoretical Frameworks for Data Mining. ACM SIGKDD Explorations, 1(2):30–32, January 2000.

    Article  Google Scholar 

  15. H. Mannila and H. Toivonen. Levelwise Search and Borders of Theories in Knowledge Discovery. Data Mining and Knowledge Discovery, 1(3):241–258, 1997.

    Article  Google Scholar 

  16. T.M. Mitchell. Version Spaces: An Approach to Concept Learning. PhD thesis, Electrical Engineering Department, Stanford University, 1979.

    Google Scholar 

  17. T.M. Mitchell. Machine Learning. McGraw-Hill, New York, 1997.

    MATH  Google Scholar 

  18. L. Pitt and L. G. Valiant. Computational limitations on learning from examples. Journal of the ACM, 35(4):965–984, 1988.

    Article  MATH  MathSciNet  Google Scholar 

  19. M. Sebag. Delaying the Choice of Bias: A Disjunctive Version Space Approach. In R. Bajcsy, editor. Proceedings of the 13 th International Conference on Machine Learning, IJCAI 1993. Morgan-Kaufmann, August–September 1993.

    Google Scholar 

  20. M. Sebag. 2nd Order Understandability of Disjunctive Version Spaces. In Y. Kodratoff and C. Nédellec, editors, 14 th International Joint Conference on Artificial Intelligence (IJCAI-95)—Workshop on Machine Learning and Comprehensibility. Morgan-Kaufmann, August 1995.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mărginean, F.A. (2003). Computational Aspects of Data Mining. In: Kumar, V., Gavrilova, M.L., Tan, C.J.K., L’Ecuyer, P. (eds) Computational Science and Its Applications — ICCSA 2003. ICCSA 2003. Lecture Notes in Computer Science, vol 2667. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44839-X_65

Download citation

  • DOI: https://doi.org/10.1007/3-540-44839-X_65

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40155-1

  • Online ISBN: 978-3-540-44839-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics