Towards a General Framework for Data Mining

Džeroski, Sašo

doi:10.1007/978-3-540-75549-4_16

Towards a General Framework for Data Mining

Sašo Džeroski¹

Conference paper

576 Accesses
27 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4747))

Abstract

In this paper, we address the ambitious task of formulating a general framework for data mining. We discuss the requirements that such a framework should fulfill: It should elegantly handle different types of data, different data mining tasks, and different types of patterns/models. We also discuss data mining languages and what they should support: this includes the design and implementation of data mining algorithms, as well as their composition into nontrivial multi-step knowledge discovery scenarios relevant for practical application. We proceed by laying out some basic concepts, starting with (structured) data and generalizations (e.g., patterns and models) and continuing with data mining tasks and basic components of data mining algorithms (i.e., refinement operators, distances, features and kernels). We next discuss how to use these concepts to formulate constraint-based data mining tasks and design generic data mining algorithms. We finally discuss how these components would fit in the overall framework and in particular into a language for data mining and knowledge discovery.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proc. of the ACM SIGMOD Conf. on Management of Data, pp. 207–216. ACM Press, New York (1993)
Google Scholar
Aho, A.V., Ullman, J.D., Hopcroft, J.E.: Data Structures and Algorithms. Addison-Wesley, Reading, MA (1983)
MATH Google Scholar
Allison, L.: Models for machine learning and data mining in functional programming. Journal of Functional Programming 15(1), 15–32 (2004)
Article MathSciNet Google Scholar
R. Bayardo (ed.) Constraints in data mining. Special issue of SIGKDD Explorations, 4(1) (2002)
Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Berlin (2006)
MATH Google Scholar
Bistarelli, S., Bonch, F.: Interestingness is not a Dichotomy: Introducing Softness in Constrained Pattern Mining. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, Springer, Heidelberg (2005)
Chapter Google Scholar
Blockeel, H., De Raedt, L., Ramon, J.: Top-down induction of clustering trees. In: Proc. of the 15th Intl. Conf. on Machine Learning, pp. 55–63. Morgan Kaufmann, San Mateo, CA (1998)
Google Scholar
Boulicaut, J.-F., Jeudy, B.: Constraint-based data mining. In: Maimon, O., Rokach, L. (eds.) The Data Mining and Knowledge Discovery Handbook, pp. 399–416. Springer, Berlin (2005)
Chapter Google Scholar
Boulicaut, J.-F., Masson, C.: Data mining query languages. In: Maimon, O., Rokach, L. (eds.) The Data Mining and Knowledge Discovery Handbook, Springer, Berlin (2005)
Google Scholar
Boulicaut, J.-F., Klemettinen, M., Mannila, H.: Modeling KDD processes within the inductive database framework. In: Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 1999. LNCS, vol. 1676, pp. 293–302. Springer, Heidelberg (1999)
Google Scholar
Boulicaut, J.-F., De Raedt, L., Mannila, H. (eds.): Constraint-Based Mining and Inductive Databases. LNCS (LNAI), vol. 3848. Springer, Heidelberg (2006)
Google Scholar
Bracewell, R.N.: The Fourier Transform and Its Applications. McGraw-Hill, New York (1965)
MATH Google Scholar
Calders, T., Rigotti, C., Boulicaut, J.-F.: A survey on condensed representations for frequent sets. In: Boulicaut, J-F., De Raedt, L., Mannila, H. (eds.) Constraint-Based Mining and Inductive Databases. LNCS (LNAI), vol. 3848, pp. 64–80. Springer, Heidelberg (2006)
Chapter Google Scholar
Calders, T., Goethals, B., Prado, A.B.: Integrating pattern mining in relational databases. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 454–461. Springer, Heidelberg (2006a)
Chapter Google Scholar
Calders, T., Lakshmanan, L.V.S., Ng, R.T., Paredaens, J.: Expressive power of an algebra for data mining. ACM Transactions on Database Systems 31(4), 1169–1214 (2006b)
Article Google Scholar
Cheng, H., Yan, X., Han, J., Hsu, C.-W.: Discriminative frequent pattern analysis for effective classification. In: Proc. 23nd Intl. Conf. on Data Engineering, pp. 716–725. IEEE Computer Society Press, Los Alamitos (2007)
Google Scholar
Deb, K.: Multi-Objective Optimization Using Evolutionary Algorithms. Wiley & Sons, New York (2001)
MATH Google Scholar
De Raedt, L., Dehaspe, L.: Clausal discovery. Machine Learning 26, 99–146 (1997)
Article MATH Google Scholar
Dehaspe, L., Toivonen, H.: Discovery of frequent Datalog patterns. Data Mining and Knowledge Discovery 3(1), 7–36 (1999)
Article Google Scholar
De Raedt, L.: A perspective on inductive databases. SIGKDD Explorations 4(2), 69–77 (2002a)
Article Google Scholar
De Raedt, L.: Data mining as constraint logic programming. In: Kakas, A.C., Sadri, F. (eds.) Computational Logic: Logic Programming and Beyond. LNCS (LNAI), vol. 2408, pp. 113–125. Springer, Heidelberg (2002b)
Chapter Google Scholar
Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.J.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)
MATH Google Scholar
Džeroski, S.: Inductive logic programming in a nutshell. In: Getoor, L., Taskar, B. (eds.) Statistical Relational Learning, MIT Press, Cambridge, MA (2007)
Google Scholar
Džeroski, S., Lavrač, N. (eds.): Relational Data Mining. Springer, Berlin (2001)
Google Scholar
Džeroski, S., Todorovski, L., Ljubič, P.: Inductive queries on polynomial equations. In: Boulicaut, J-F., De Raedt, L., Mannila, H. (eds.) Constraint-Based Mining and Inductive Databases. LNCS (LNAI), vol. 3848, pp. 127–154. Springer, Heidelberg (2006)
Chapter Google Scholar
Fayyad, U., Piatetsky-Shapiro, G., Uthurusamy, R.: Summary from the KDD-2003 panel – “Data Mining: The Next 10 Years”. SIGKDD Explorations 5(2), 191–196 (2003)
Article Google Scholar
Friedman, J.H., Fisher, N.I.: Bump hunting in high-dimensional data. Statistics and Computing 9(2), 123–143 (1999)
Article Google Scholar
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery: An overview. In: Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 495–515. MIT Press, Cambridge, MA (1996)
Google Scholar
Frawley, W.J., Piatetsky-Shapiro, G., Matheus, C.J.: Knowledge discovery in databases: An overview. In: Knowledge Discovery in Databases, pp. 1–30. AAAI/MIT Press, Cambridge
Google Scholar
Gaertner, T.: A survey of kernels for structured data. SIGKDD Explorations 5(1), 49–58 (2003)
Article MathSciNet Google Scholar
Garofalakis, M., Hyun, D., Rastogi, R., Shim, K.: Building decision trees with constraints. Data Mining and Knowledge Discovery 7(2), 187–214 (2003)
Article MathSciNet Google Scholar
Getoor, L., Taskar, B. (eds.): Statistical Relational Learning. MIT Press, Cambridge, MA (2007)
Google Scholar
Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. In: Proc. of the 21st Intl. Conf. on Data Engineering, pp. 341–352. IEEE Computer Society Press, Los Alamitos (2005)
Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco, CA (2001)
Google Scholar
Hand, D.J., Mannila, H., Smyth, P.: Principles of Data Mining. MIT Press, Cambridge, MA (2001)
Google Scholar
Haussler, D.: Convolution kernels on discrete structures. UC Santa Cruz, Technical Report UCS-CRL-99-10 (1999)
Google Scholar
Imielinski, T., Mannila, H.: A database perspective on knowledge discovery. Communications of the ACM 39(11), 58–64 (1996)
Article Google Scholar
Johnson, T., Lakshmanan, L.V., Ng, R.: The 3W model and algebra for unified data mining. In: Proc. of the Intl. Conf. on Very Large Data Bases, pp. 21–32. Morgan Kaufmann, San Francisco, CA (2000)
Google Scholar
Kalousis, A., Woznica, A., Hilario, M.: A unifying framework for relational distance-based learning founded on relational algebra. Technical Report, Computer Science Department, University of Geneva (2006)
Google Scholar
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley & Sons, New York (1990)
Google Scholar
King, R.D., Karwath, A., Clare, A., Dehaspe, L.: The utility of different representations of protein sequence for predicting functional class. Bioinformatics 17(5), 445–454 (2001)
Article Google Scholar
Kloesgen, W.: Data mining tasks and methods: Subgroup discovery: deviation analysis. In: Kloesgen, W., Zytkow, J.M. (eds.) Handbook of Data Mining and Knowledge Discovery, pp. 354–361. Oxford University Press, Oxford (2002)
Google Scholar
Kramer, S., Aufschild, V., Hapfelmeier, A., Jarasch, A., Kessler, K., Reckow, S., Wicker, J., Richter, L.: Inductive Databases in the Relational Model: The Data as the Bridge. In: Bonchi, F., Boulicaut, J.-F. (eds.) KDID 2005. LNCS, vol. 3933, pp. 124–138. Springer, Heidelberg (2006)
Google Scholar
Lavrač, N., Kavšek, B., Flach, P.A., Todorovski, L.: Subgroup Discovery with CN2-SD. Journal of Machine Learning Research 5, 153–188 (2004)
Google Scholar
Lavrač, N., Džeroski, S.: Inductive Logic Programming: Techniques and Applications. Ellis Horwood, Chichester (1994)
Google Scholar
Liu, H., Motoda, H.: Feature Extraction, Construction and Selection: A Data Mining Perspective. Kluwer, Dorderecht (1998)
MATH Google Scholar
Lloyd, J.W.: Foundations of Logic Programming. Springer, Berlin (1987)
MATH Google Scholar
Lloyd, J.W.: An introduction to deductive database systems. Australian Computer Journal 15(2), 52–57 (1983)
MathSciNet Google Scholar
Lloyd, J.W.: Logic for Learning. Springer, Berlin (2003)
MATH Google Scholar
Mallat, S.: A Wavelet Tour of Signal Processing. Academic Press, London (1999)
MATH Google Scholar
Inductive databases vision: Relational operations on models. Unpublished slides. In: Presented at the meeting of the cInQ project (December 2001)
Google Scholar
Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1(3), 241–258 (1997)
Article Google Scholar
Michalski, R.S.: Knowledge acquisition through conceptual clustering: A theoretical framework and an algorithm for partitioning data into conjunctive concepts. Intl. Jrnl. of Policy Analysis and Information Systems 4, 219–244 (1980)
MathSciNet Google Scholar
Mitchell, T.M.: Generalization as search. Artif. Intell. 18(2), 203–226 (1982)
Article Google Scholar
Nijssen, S., Fromont, E.: Mining optimal decision trees from itemset lattices. In: Proc. of The 13th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, ACM Press, New York (to appear, 2007)
Google Scholar
Piatetsky-Shapiro, G., Djeraba, C., Getoor, L., Grossman, R., Feldman, R., Zaki, M.: What are the grand challenges for data mining? KDD-2006 Panel report. SIGKDD Explorations 8(2), 70–77 (2006)
Article Google Scholar
Ramakrishnan, R., et al.: Data Mining: The Next Generation. In: Ramakrishnan, R., Agrawal, R., Freytag, J.-C. (eds.) Perspectives Wshp. – Data Mining: The Next Generation. Intl. Begegnungs- und Forschungszentrum fuer Informatik (IBFI), Schloss Dagstuhl, Germany (2005)
Google Scholar
Ramon, J., Bruynooghe, M.: A polynomial time computable metric between point sets. Acta Informatica 37(10), 765–780 (2001)
Article MATH MathSciNet Google Scholar
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Google Scholar
Siebes, A.: Data mining in inductive databases. In: Bonchi, F., Boulicaut, J-F. (eds.) KDID 2005. LNCS, vol. 3933, pp. 1–23. Springer, Heidelberg (2006)
Google Scholar
Srinivasan, A., King, R.D.: Feature construction with inductive logic programming: A study of quantitative predictions of biological activity aided by structural attributes. Knowledge Discovery and Data Mining 3(1), 37–57 (1999)
Article Google Scholar
Struyf, J., Džeroski, S.: Constraint based induction of multi-objective regression trees. In: Bonchi, F., Boulicaut, J-F. (eds.) KDID 2005. LNCS, vol. 3933, pp. 222–233. Springer, Heidelberg (2006)
Google Scholar
Termier, A., Tamada, Y., Imoto, S., Washio, T., Higuchi, T.: From closed tree mining towards closed DAG mining. In: Proc. of the Intl. Wshp. on Data Mining and Statistical Science, pp. 1–7 (2006)
Google Scholar
Thompson, S.: Haskell: The Craft of Functional Programming. Add. Wesley, Reading (1999)
Google Scholar
Tušar, T.: Design of an Algorithm for Multiobjective Optimization with Differential Evolution. M.Sc. Thesis. Faculty of Computer and Information Science, University of Ljubljana, Slovenia (2007)
Google Scholar
Vilalta, R., Drissi, Y.: A perspective view and survey of meta-learning. Artificial Intelligence Review 18(2), 77–95 (2002)
Article Google Scholar
Wagstaff, K., Cardie, C.: Clustering with instance-level constraints. In: Proc. 17th Intl. Conf. on Machine Learning, pp. 1103–1110. Morgan Kaufmann, San Francisco, CA (2000)
Google Scholar
Woznica, A., Kalousis, A., Hilario, M.: Kernels on lists and sets over relational algebra: an application to classification of protein fingerprints. In: Ng, W-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 546–551. Springer, Heidelberg (2006)
Chapter Google Scholar
Yang, Q., Wu, X.: 10 Challenging problems in data mining research. Intl. Jrnl. of Information Technology & Decision Making 5(4), 597–604 (2006)
Article Google Scholar
Ženko, B., Džeroski, S., Struyf, J.: Learning predictive clustering rules. In: Bonchi, F., Boulicaut, J-F. (eds.) KDID 2005. LNCS, vol. 3933, pp. 234–250. Springer, Heidelberg (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Jožef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia
Sašo Džeroski

Authors

Sašo Džeroski
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Sašo Džeroski Jan Struyf

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Džeroski, S. (2007). Towards a General Framework for Data Mining. In: Džeroski, S., Struyf, J. (eds) Knowledge Discovery in Inductive Databases. KDID 2006. Lecture Notes in Computer Science, vol 4747. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75549-4_16

Download citation

DOI: https://doi.org/10.1007/978-3-540-75549-4_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75548-7
Online ISBN: 978-3-540-75549-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics