Classification Techniques and Error Control in Logic Mining

Felici, Giovanni; Simeone, Bruno; Spinelli, Vincenzo

doi:10.1007/978-1-4419-1280-0_5

Giovanni Felici⁴,
Bruno Simeone⁵ &
Vincenzo Spinelli⁶

Part of the book series: Annals of Information Systems ((AOIS,volume 8))

4110 Accesses
2 Citations

Abstract

In this chapter we consider box clustering, a method for supervised classification that partitions the feature space with particularly simple convex sets (boxes). Box clustering produces systems of logic rules obtained from data in numerical form. Such rules explicitly represent the logic relations hidden in the data w.r.t. a target class. The algorithm adopted to solve the box clustering problem is based on a simple and fast agglomerative method which can be affected by the initial choice of the starting point and by the rules adopted by the method. In this chapter we propose and motivate a randomized approach that generates a large number of candidate models using different data samples and then chooses the best candidate model according to two criteria: model size, as expressed by the number of boxes of the model, and model precision, as expressed by the error on the test split. We adopt a Pareto-optimal strategy for the choice of the solution, under the hypothesis that such a choice would identify simple models with good predictive power. This procedure has been applied to a wide range of well-known data sets to evaluate to what extent our results confirm this hypothesis; its performances are then compared with those of competing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

M.A. Davenport, R. G. Baraniuk, and C. D. Scott. Learning minimum volume sets with support vector machines. IEEE International Workshop on Machine Learning for Signal Processing (MLSP), Maynooth, Ireland, 2006.
Google Scholar
M.A. Davenport, R. G. Baraniuk, and C. D. Scott. Controlling false alarms with support vector machines. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Toulouse, France, 2006.
Google Scholar
Neyman-Pearson SVMs: www.ece.rice.edu/md/np_svm.php/
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth International, Belmont, CA, 1984.
Google Scholar
S. Foldes, P.L. Hammer. Disjunctive and Conjunctive Normal Forms of Pseudo-Boolean Functions. RUTCOR Research Report, RRR 1-2000, Also available at http://rutcor.rutgers.edu/pub/rrr/reports2000/01_2000.ps
G. Felici and K. Truemper. A Minsat Approach for Learning in Logic Domains. INFORMS Journal on Computing, 13 (3), 2001, 1–17.
Google Scholar
G. Felici, F-S. Sun, and K. Truemper. Learning Logic Formulas and Related Error Distributions, in Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques, G. Felici and E. Trintaphyllou eds., Springer Science, New York.
Google Scholar
G. Alexe, P.L. Hammer, P.L. Kogan. Comprehensive vs. Comprehensible Classifiers in Logical Analysis of Data. RUTCOR Research Report, RRR 9-2002. Also available at http://rutcor.rutgers.edu/pub/rrr/reports2002/40_2002.pdf
E. Boros, P.L. Hammer, T. Ibaraki, A. Kogan, E. Mayoraz, and I. Muchnik. An implementation of logical analysis of data. IEEE Transactions on Knowledge and Data Engineering, 12 (2): 292–306, November 2000.
Google Scholar
K. Truemper. Lsquare System for Learning Logic. University of Texas at Dallas, Computer Science Program, April 1999.
Google Scholar
E. Triantaphyllou. The OCAT approach for data mining and knowledge discovery. Working Paper, IMSE Department, Louisiana State University, Baton Rouge, LA 70803-6409, USA, 2001.
Google Scholar
E. Triantaphyllou. The One Clause At a Time (OCAT) Approach to Data Mining and Knowledge Discovery, in Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques, G. Felici and E. Trintaphyllou eds., Springer, Heidelberg, Germany, Chapter 2, pp. 45–87, 2005.
Google Scholar
S. Bartnikowsi, M. Granberry, and J. Mugan. Transformation of rational data and set data to logic data, in Data Mining & Knowledge Discovery Based on Rule Induction Techniques. Massive Computing, Springer Science, 12 (5) : 253–278, November 2006.
Google Scholar
P.L. Hammer, I.I. Lozina. Boolean Separators and Approximate Boolean Classifiers. RUTCOR Research Report, RRR 14-2006. Also available at http://rutcor.rutgers.edu/pub/rrr/reports2006/14_2006.pdf
E. Boros, P.L. Hammer, T. Ibaraki, A. Kogan. Logical Analysis of Numerical Data. Mathematical Programming, 79: 163–190, 1997.
Google Scholar
Y. Crama, P.L. Hammer, T. Ibaraki. Cause-effect relationships and partially defined Boolean functions. Annals of Operations Research, 16 : 299–325, 1988.
Article Google Scholar
P.L. Hammer. Partially defined Boolean functions and cause-effect relationships. Lecture at the International Conference on Multi-Attribute Decision Making Via Or-Based Expert Systems, University of Passau, Germany, April 1986.
Google Scholar
O. Ekin, P.L. Hammer, A. Kogan. Convexity and Logical Analysis of Data. RUTCOR Research Report, RRR 5–1998. Also available at http://rutcor.rutgers.edu/pub/rrr/reports1998/05.ps.
B. Simeone and V. Spinelli. The optimization problem framework for box clustering approach in logic mining. Book of Abstract of Euro XXII – 22nd European Conference on Operational Research, page 193. The Association of European Operational Research Societies, July 2007.
Google Scholar
E. Boros, T. Ibaraki, L. Shi, M. Yagiura. Generating all ‘good’ patterns in polynomial expected time. Lecture at the 6th International Symposium on Artificial Intelligence and Mathematics, Ft. Lauderdale, Florida, January 2000.
Google Scholar
P.L. Hammer, A. Kogan, B. Simeone, S. Szedmak. Pareto-optimal patterns in logical analysis of data. Discrete Applied Mathematics 144: 79–102, 2004. Also available at http://rutcor.rutgers.edu/pub/rrr/reports2001/07.pdf.
T.O. Bonates, P.L. Hammer, P.L. Kogan. Maximum Patterns in Datasets. RUTCOR Research Report, RRR 9-2006. Also available at http://rutcor.rutgers.edu/pub/rrr/reports2006/9_2006.pdf
J. Eckstein, P.L. Hammer, Y. Liu, M. Nediak, and B. Simeone. The maximum box problem and its application to data analysis. Computational Optimization and Application, 23: 285–298, 2002.
Article Google Scholar
P.L. Hammer, Y. Liu, S. Szedmák, and B. Simeone. Saturated systems of homogeneous boxes and the logical analysis of numerical data. Discrete Applied Mathematics, Volume 144, 1–2: 103–109, 2004.
Article Google Scholar
B. Simeone, G. Felici, and V. Spinelli. A graph coloring approach for box clustering techniques in logic mining. Book of Abstract of Euro XXII – 22nd European Conference on Operational Research, page 193. The Association of European Operational Research Societies, July 2007.
Google Scholar
S. Wu and P. Flach. A scored AUC Metric for Classifier Evaluation and Selection. Second Workshop on ROC Analysis in ML, Bonn, Germany, August 11, 2005.
Google Scholar
J. Huang and C.X. Ling. Using AUC and Accuracy in Evaluating Learning Algorithms. IEEE Transactions on Knowledge and Data Engineering vol. 17, no. 3, pp. 299–310, 2005.
Article Google Scholar
B. Henderson. The experience curve reviewed: IV the growth share matrix or product portfolio, 1973. Also available at URL http://www.bcg.com/publications/files/Experience_Curve_IV_Growth_Share_Matrix_1973.pdf
C.L. Blake and C.J. Merz. UCI repository of machine learning databases. University of California, Irvine, Department of Information and Computer Sciences, 1998. Also available at http://www.ics.uci.edu/mlearn/MLRepository.html
I.H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. 2nd Edition, Morgan Kaufmann, San Francisco, 2005. URL http://www.cs.waikato.ac.nz/ml/.
R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA.
Google Scholar

Download references

Author information

Authors and Affiliations

Istituto di Analisi dei Sistemi ed Informatica ‘Antonio Ruberti’, Consiglio Nazionale delle Ricerche, Viale Manzoni, 30, 00185, Rome, Italy
Giovanni Felici
Dipartimento di Statistica, Probabilità e Statistiche Applicate, Università ‘La Sapienza’, Piazzale Aldo Moro 5, 00185, Rome, Italy
Bruno Simeone
ISTAT – Istituto Nazionale di Statistica, Via Tuscolana, 1788, 00173, Rome, Italy
Vincenzo Spinelli

Authors

Giovanni Felici
View author publications
You can also search for this author in PubMed Google Scholar
Bruno Simeone
View author publications
You can also search for this author in PubMed Google Scholar
Vincenzo Spinelli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Giovanni Felici , Bruno Simeone or Vincenzo Spinelli .

Editor information

Editors and Affiliations

Inst. Wirtschaftsinformatik, Universität Hamburg, Von-Melle-Park 5, Hamburg, 20146, Germany
Robert Stahlbock
Management School, Dept. Management Science, Lancaster University, Lancaster, LA1 4YX, United Kingdom
Sven F. Crone
Inst. Wirtschaftsinformatik, Universität Hamburg, Von-Melle-Park 5, Hamburg, 20146, Germany
Stefan Lessmann

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Felici, G., Simeone, B., Spinelli, V. (2010). Classification Techniques and Error Control in Logic Mining. In: Stahlbock, R., Crone, S., Lessmann, S. (eds) Data Mining. Annals of Information Systems, vol 8. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-1280-0_5

Download citation

DOI: https://doi.org/10.1007/978-1-4419-1280-0_5
Published: 15 October 2009
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-1279-4
Online ISBN: 978-1-4419-1280-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics