Skip to main content

Towards a General Framework for Data Mining

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4747))

Abstract

In this paper, we address the ambitious task of formulating a general framework for data mining. We discuss the requirements that such a framework should fulfill: It should elegantly handle different types of data, different data mining tasks, and different types of patterns/models. We also discuss data mining languages and what they should support: this includes the design and implementation of data mining algorithms, as well as their composition into nontrivial multi-step knowledge discovery scenarios relevant for practical application. We proceed by laying out some basic concepts, starting with (structured) data and generalizations (e.g., patterns and models) and continuing with data mining tasks and basic components of data mining algorithms (i.e., refinement operators, distances, features and kernels). We next discuss how to use these concepts to formulate constraint-based data mining tasks and design generic data mining algorithms. We finally discuss how these components would fit in the overall framework and in particular into a language for data mining and knowledge discovery.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proc. of the ACM SIGMOD Conf. on Management of Data, pp. 207–216. ACM Press, New York (1993)

    Google Scholar 

  2. Aho, A.V., Ullman, J.D., Hopcroft, J.E.: Data Structures and Algorithms. Addison-Wesley, Reading, MA (1983)

    MATH  Google Scholar 

  3. Allison, L.: Models for machine learning and data mining in functional programming. Journal of Functional Programming 15(1), 15–32 (2004)

    Article  MathSciNet  Google Scholar 

  4. R. Bayardo (ed.) Constraints in data mining. Special issue of SIGKDD Explorations, 4(1) (2002)

    Google Scholar 

  5. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Berlin (2006)

    MATH  Google Scholar 

  6. Bistarelli, S., Bonch, F.: Interestingness is not a Dichotomy: Introducing Softness in Constrained Pattern Mining. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  7. Blockeel, H., De Raedt, L., Ramon, J.: Top-down induction of clustering trees. In: Proc. of the 15th Intl. Conf. on Machine Learning, pp. 55–63. Morgan Kaufmann, San Mateo, CA (1998)

    Google Scholar 

  8. Boulicaut, J.-F., Jeudy, B.: Constraint-based data mining. In: Maimon, O., Rokach, L. (eds.) The Data Mining and Knowledge Discovery Handbook, pp. 399–416. Springer, Berlin (2005)

    Chapter  Google Scholar 

  9. Boulicaut, J.-F., Masson, C.: Data mining query languages. In: Maimon, O., Rokach, L. (eds.) The Data Mining and Knowledge Discovery Handbook, Springer, Berlin (2005)

    Google Scholar 

  10. Boulicaut, J.-F., Klemettinen, M., Mannila, H.: Modeling KDD processes within the inductive database framework. In: Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 1999. LNCS, vol. 1676, pp. 293–302. Springer, Heidelberg (1999)

    Google Scholar 

  11. Boulicaut, J.-F., De Raedt, L., Mannila, H. (eds.): Constraint-Based Mining and Inductive Databases. LNCS (LNAI), vol. 3848. Springer, Heidelberg (2006)

    Google Scholar 

  12. Bracewell, R.N.: The Fourier Transform and Its Applications. McGraw-Hill, New York (1965)

    MATH  Google Scholar 

  13. Calders, T., Rigotti, C., Boulicaut, J.-F.: A survey on condensed representations for frequent sets. In: Boulicaut, J-F., De Raedt, L., Mannila, H. (eds.) Constraint-Based Mining and Inductive Databases. LNCS (LNAI), vol. 3848, pp. 64–80. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  14. Calders, T., Goethals, B., Prado, A.B.: Integrating pattern mining in relational databases. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 454–461. Springer, Heidelberg (2006a)

    Chapter  Google Scholar 

  15. Calders, T., Lakshmanan, L.V.S., Ng, R.T., Paredaens, J.: Expressive power of an algebra for data mining. ACM Transactions on Database Systems 31(4), 1169–1214 (2006b)

    Article  Google Scholar 

  16. Cheng, H., Yan, X., Han, J., Hsu, C.-W.: Discriminative frequent pattern analysis for effective classification. In: Proc. 23nd Intl. Conf. on Data Engineering, pp. 716–725. IEEE Computer Society Press, Los Alamitos (2007)

    Google Scholar 

  17. Deb, K.: Multi-Objective Optimization Using Evolutionary Algorithms. Wiley & Sons, New York (2001)

    MATH  Google Scholar 

  18. De Raedt, L., Dehaspe, L.: Clausal discovery. Machine Learning 26, 99–146 (1997)

    Article  MATH  Google Scholar 

  19. Dehaspe, L., Toivonen, H.: Discovery of frequent Datalog patterns. Data Mining and Knowledge Discovery 3(1), 7–36 (1999)

    Article  Google Scholar 

  20. De Raedt, L.: A perspective on inductive databases. SIGKDD Explorations 4(2), 69–77 (2002a)

    Article  Google Scholar 

  21. De Raedt, L.: Data mining as constraint logic programming. In: Kakas, A.C., Sadri, F. (eds.) Computational Logic: Logic Programming and Beyond. LNCS (LNAI), vol. 2408, pp. 113–125. Springer, Heidelberg (2002b)

    Chapter  Google Scholar 

  22. Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.J.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)

    MATH  Google Scholar 

  23. Džeroski, S.: Inductive logic programming in a nutshell. In: Getoor, L., Taskar, B. (eds.) Statistical Relational Learning, MIT Press, Cambridge, MA (2007)

    Google Scholar 

  24. Džeroski, S., Lavrač, N. (eds.): Relational Data Mining. Springer, Berlin (2001)

    Google Scholar 

  25. Džeroski, S., Todorovski, L., Ljubič, P.: Inductive queries on polynomial equations. In: Boulicaut, J-F., De Raedt, L., Mannila, H. (eds.) Constraint-Based Mining and Inductive Databases. LNCS (LNAI), vol. 3848, pp. 127–154. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  26. Fayyad, U., Piatetsky-Shapiro, G., Uthurusamy, R.: Summary from the KDD-2003 panel – “Data Mining: The Next 10 Years”. SIGKDD Explorations 5(2), 191–196 (2003)

    Article  Google Scholar 

  27. Friedman, J.H., Fisher, N.I.: Bump hunting in high-dimensional data. Statistics and Computing 9(2), 123–143 (1999)

    Article  Google Scholar 

  28. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery: An overview. In: Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 495–515. MIT Press, Cambridge, MA (1996)

    Google Scholar 

  29. Frawley, W.J., Piatetsky-Shapiro, G., Matheus, C.J.: Knowledge discovery in databases: An overview. In: Knowledge Discovery in Databases, pp. 1–30. AAAI/MIT Press, Cambridge

    Google Scholar 

  30. Gaertner, T.: A survey of kernels for structured data. SIGKDD Explorations 5(1), 49–58 (2003)

    Article  MathSciNet  Google Scholar 

  31. Garofalakis, M., Hyun, D., Rastogi, R., Shim, K.: Building decision trees with constraints. Data Mining and Knowledge Discovery 7(2), 187–214 (2003)

    Article  MathSciNet  Google Scholar 

  32. Getoor, L., Taskar, B. (eds.): Statistical Relational Learning. MIT Press, Cambridge, MA (2007)

    Google Scholar 

  33. Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. In: Proc. of the 21st Intl. Conf. on Data Engineering, pp. 341–352. IEEE Computer Society Press, Los Alamitos (2005)

    Google Scholar 

  34. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco, CA (2001)

    Google Scholar 

  35. Hand, D.J., Mannila, H., Smyth, P.: Principles of Data Mining. MIT Press, Cambridge, MA (2001)

    Google Scholar 

  36. Haussler, D.: Convolution kernels on discrete structures. UC Santa Cruz, Technical Report UCS-CRL-99-10 (1999)

    Google Scholar 

  37. Imielinski, T., Mannila, H.: A database perspective on knowledge discovery. Communications of the ACM 39(11), 58–64 (1996)

    Article  Google Scholar 

  38. Johnson, T., Lakshmanan, L.V., Ng, R.: The 3W model and algebra for unified data mining. In: Proc. of the Intl. Conf. on Very Large Data Bases, pp. 21–32. Morgan Kaufmann, San Francisco, CA (2000)

    Google Scholar 

  39. Kalousis, A., Woznica, A., Hilario, M.: A unifying framework for relational distance-based learning founded on relational algebra. Technical Report, Computer Science Department, University of Geneva (2006)

    Google Scholar 

  40. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley & Sons, New York (1990)

    Google Scholar 

  41. King, R.D., Karwath, A., Clare, A., Dehaspe, L.: The utility of different representations of protein sequence for predicting functional class. Bioinformatics 17(5), 445–454 (2001)

    Article  Google Scholar 

  42. Kloesgen, W.: Data mining tasks and methods: Subgroup discovery: deviation analysis. In: Kloesgen, W., Zytkow, J.M. (eds.) Handbook of Data Mining and Knowledge Discovery, pp. 354–361. Oxford University Press, Oxford (2002)

    Google Scholar 

  43. Kramer, S., Aufschild, V., Hapfelmeier, A., Jarasch, A., Kessler, K., Reckow, S., Wicker, J., Richter, L.: Inductive Databases in the Relational Model: The Data as the Bridge. In: Bonchi, F., Boulicaut, J.-F. (eds.) KDID 2005. LNCS, vol. 3933, pp. 124–138. Springer, Heidelberg (2006)

    Google Scholar 

  44. Lavrač, N., Kavšek, B., Flach, P.A., Todorovski, L.: Subgroup Discovery with CN2-SD. Journal of Machine Learning Research 5, 153–188 (2004)

    Google Scholar 

  45. Lavrač, N., Džeroski, S.: Inductive Logic Programming: Techniques and Applications. Ellis Horwood, Chichester (1994)

    Google Scholar 

  46. Liu, H., Motoda, H.: Feature Extraction, Construction and Selection: A Data Mining Perspective. Kluwer, Dorderecht (1998)

    MATH  Google Scholar 

  47. Lloyd, J.W.: Foundations of Logic Programming. Springer, Berlin (1987)

    MATH  Google Scholar 

  48. Lloyd, J.W.: An introduction to deductive database systems. Australian Computer Journal 15(2), 52–57 (1983)

    MathSciNet  Google Scholar 

  49. Lloyd, J.W.: Logic for Learning. Springer, Berlin (2003)

    MATH  Google Scholar 

  50. Mallat, S.: A Wavelet Tour of Signal Processing. Academic Press, London (1999)

    MATH  Google Scholar 

  51. Inductive databases vision: Relational operations on models. Unpublished slides. In: Presented at the meeting of the cInQ project (December 2001)

    Google Scholar 

  52. Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1(3), 241–258 (1997)

    Article  Google Scholar 

  53. Michalski, R.S.: Knowledge acquisition through conceptual clustering: A theoretical framework and an algorithm for partitioning data into conjunctive concepts. Intl. Jrnl. of Policy Analysis and Information Systems 4, 219–244 (1980)

    MathSciNet  Google Scholar 

  54. Mitchell, T.M.: Generalization as search. Artif. Intell. 18(2), 203–226 (1982)

    Article  Google Scholar 

  55. Nijssen, S., Fromont, E.: Mining optimal decision trees from itemset lattices. In: Proc. of The 13th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, ACM Press, New York (to appear, 2007)

    Google Scholar 

  56. Piatetsky-Shapiro, G., Djeraba, C., Getoor, L., Grossman, R., Feldman, R., Zaki, M.: What are the grand challenges for data mining? KDD-2006 Panel report. SIGKDD Explorations 8(2), 70–77 (2006)

    Article  Google Scholar 

  57. Ramakrishnan, R., et al.: Data Mining: The Next Generation. In: Ramakrishnan, R., Agrawal, R., Freytag, J.-C. (eds.) Perspectives Wshp. – Data Mining: The Next Generation. Intl. Begegnungs- und Forschungszentrum fuer Informatik (IBFI), Schloss Dagstuhl, Germany (2005)

    Google Scholar 

  58. Ramon, J., Bruynooghe, M.: A polynomial time computable metric between point sets. Acta Informatica 37(10), 765–780 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  59. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)

    Google Scholar 

  60. Siebes, A.: Data mining in inductive databases. In: Bonchi, F., Boulicaut, J-F. (eds.) KDID 2005. LNCS, vol. 3933, pp. 1–23. Springer, Heidelberg (2006)

    Google Scholar 

  61. Srinivasan, A., King, R.D.: Feature construction with inductive logic programming: A study of quantitative predictions of biological activity aided by structural attributes. Knowledge Discovery and Data Mining 3(1), 37–57 (1999)

    Article  Google Scholar 

  62. Struyf, J., Džeroski, S.: Constraint based induction of multi-objective regression trees. In: Bonchi, F., Boulicaut, J-F. (eds.) KDID 2005. LNCS, vol. 3933, pp. 222–233. Springer, Heidelberg (2006)

    Google Scholar 

  63. Termier, A., Tamada, Y., Imoto, S., Washio, T., Higuchi, T.: From closed tree mining towards closed DAG mining. In: Proc. of the Intl. Wshp. on Data Mining and Statistical Science, pp. 1–7 (2006)

    Google Scholar 

  64. Thompson, S.: Haskell: The Craft of Functional Programming. Add. Wesley, Reading (1999)

    Google Scholar 

  65. Tušar, T.: Design of an Algorithm for Multiobjective Optimization with Differential Evolution. M.Sc. Thesis. Faculty of Computer and Information Science, University of Ljubljana, Slovenia (2007)

    Google Scholar 

  66. Vilalta, R., Drissi, Y.: A perspective view and survey of meta-learning. Artificial Intelligence Review 18(2), 77–95 (2002)

    Article  Google Scholar 

  67. Wagstaff, K., Cardie, C.: Clustering with instance-level constraints. In: Proc. 17th Intl. Conf. on Machine Learning, pp. 1103–1110. Morgan Kaufmann, San Francisco, CA (2000)

    Google Scholar 

  68. Woznica, A., Kalousis, A., Hilario, M.: Kernels on lists and sets over relational algebra: an application to classification of protein fingerprints. In: Ng, W-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 546–551. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  69. Yang, Q., Wu, X.: 10 Challenging problems in data mining research. Intl. Jrnl. of Information Technology & Decision Making 5(4), 597–604 (2006)

    Article  Google Scholar 

  70. Ženko, B., Džeroski, S., Struyf, J.: Learning predictive clustering rules. In: Bonchi, F., Boulicaut, J-F. (eds.) KDID 2005. LNCS, vol. 3933, pp. 234–250. Springer, Heidelberg (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Sašo Džeroski Jan Struyf

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Džeroski, S. (2007). Towards a General Framework for Data Mining. In: Džeroski, S., Struyf, J. (eds) Knowledge Discovery in Inductive Databases. KDID 2006. Lecture Notes in Computer Science, vol 4747. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75549-4_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-75549-4_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-75548-7

  • Online ISBN: 978-3-540-75549-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics