Abstract
In this chapter we consider box clustering, a method for supervised classification that partitions the feature space with particularly simple convex sets (boxes). Box clustering produces systems of logic rules obtained from data in numerical form. Such rules explicitly represent the logic relations hidden in the data w.r.t. a target class. The algorithm adopted to solve the box clustering problem is based on a simple and fast agglomerative method which can be affected by the initial choice of the starting point and by the rules adopted by the method. In this chapter we propose and motivate a randomized approach that generates a large number of candidate models using different data samples and then chooses the best candidate model according to two criteria: model size, as expressed by the number of boxes of the model, and model precision, as expressed by the error on the test split. We adopt a Pareto-optimal strategy for the choice of the solution, under the hypothesis that such a choice would identify simple models with good predictive power. This procedure has been applied to a wide range of well-known data sets to evaluate to what extent our results confirm this hypothesis; its performances are then compared with those of competing methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
M.A. Davenport, R. G. Baraniuk, and C. D. Scott. Learning minimum volume sets with support vector machines. IEEE International Workshop on Machine Learning for Signal Processing (MLSP), Maynooth, Ireland, 2006.
M.A. Davenport, R. G. Baraniuk, and C. D. Scott. Controlling false alarms with support vector machines. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Toulouse, France, 2006.
Neyman-Pearson SVMs: www.ece.rice.edu/md/np_svm.php/
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth International, Belmont, CA, 1984.
S. Foldes, P.L. Hammer. Disjunctive and Conjunctive Normal Forms of Pseudo-Boolean Functions. RUTCOR Research Report, RRR 1-2000, Also available at http://rutcor.rutgers.edu/pub/rrr/reports2000/01_2000.ps
G. Felici and K. Truemper. A Minsat Approach for Learning in Logic Domains. INFORMS Journal on Computing, 13 (3), 2001, 1–17.
G. Felici, F-S. Sun, and K. Truemper. Learning Logic Formulas and Related Error Distributions, in Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques, G. Felici and E. Trintaphyllou eds., Springer Science, New York.
G. Alexe, P.L. Hammer, P.L. Kogan. Comprehensive vs. Comprehensible Classifiers in Logical Analysis of Data. RUTCOR Research Report, RRR 9-2002. Also available at http://rutcor.rutgers.edu/pub/rrr/reports2002/40_2002.pdf
E. Boros, P.L. Hammer, T. Ibaraki, A. Kogan, E. Mayoraz, and I. Muchnik. An implementation of logical analysis of data. IEEE Transactions on Knowledge and Data Engineering, 12 (2): 292–306, November 2000.
K. Truemper. Lsquare System for Learning Logic. University of Texas at Dallas, Computer Science Program, April 1999.
E. Triantaphyllou. The OCAT approach for data mining and knowledge discovery. Working Paper, IMSE Department, Louisiana State University, Baton Rouge, LA 70803-6409, USA, 2001.
E. Triantaphyllou. The One Clause At a Time (OCAT) Approach to Data Mining and Knowledge Discovery, in Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques, G. Felici and E. Trintaphyllou eds., Springer, Heidelberg, Germany, Chapter 2, pp. 45–87, 2005.
S. Bartnikowsi, M. Granberry, and J. Mugan. Transformation of rational data and set data to logic data, in Data Mining & Knowledge Discovery Based on Rule Induction Techniques. Massive Computing, Springer Science, 12 (5) : 253–278, November 2006.
P.L. Hammer, I.I. Lozina. Boolean Separators and Approximate Boolean Classifiers. RUTCOR Research Report, RRR 14-2006. Also available at http://rutcor.rutgers.edu/pub/rrr/reports2006/14_2006.pdf
E. Boros, P.L. Hammer, T. Ibaraki, A. Kogan. Logical Analysis of Numerical Data. Mathematical Programming, 79: 163–190, 1997.
Y. Crama, P.L. Hammer, T. Ibaraki. Cause-effect relationships and partially defined Boolean functions. Annals of Operations Research, 16 : 299–325, 1988.
P.L. Hammer. Partially defined Boolean functions and cause-effect relationships. Lecture at the International Conference on Multi-Attribute Decision Making Via Or-Based Expert Systems, University of Passau, Germany, April 1986.
O. Ekin, P.L. Hammer, A. Kogan. Convexity and Logical Analysis of Data. RUTCOR Research Report, RRR 5–1998. Also available at http://rutcor.rutgers.edu/pub/rrr/reports1998/05.ps.
B. Simeone and V. Spinelli. The optimization problem framework for box clustering approach in logic mining. Book of Abstract of Euro XXII – 22nd European Conference on Operational Research, page 193. The Association of European Operational Research Societies, July 2007.
E. Boros, T. Ibaraki, L. Shi, M. Yagiura. Generating all ‘good’ patterns in polynomial expected time. Lecture at the 6th International Symposium on Artificial Intelligence and Mathematics, Ft. Lauderdale, Florida, January 2000.
P.L. Hammer, A. Kogan, B. Simeone, S. Szedmak. Pareto-optimal patterns in logical analysis of data. Discrete Applied Mathematics 144: 79–102, 2004. Also available at http://rutcor.rutgers.edu/pub/rrr/reports2001/07.pdf.
T.O. Bonates, P.L. Hammer, P.L. Kogan. Maximum Patterns in Datasets. RUTCOR Research Report, RRR 9-2006. Also available at http://rutcor.rutgers.edu/pub/rrr/reports2006/9_2006.pdf
J. Eckstein, P.L. Hammer, Y. Liu, M. Nediak, and B. Simeone. The maximum box problem and its application to data analysis. Computational Optimization and Application, 23: 285–298, 2002.
P.L. Hammer, Y. Liu, S. Szedmák, and B. Simeone. Saturated systems of homogeneous boxes and the logical analysis of numerical data. Discrete Applied Mathematics, Volume 144, 1–2: 103–109, 2004.
B. Simeone, G. Felici, and V. Spinelli. A graph coloring approach for box clustering techniques in logic mining. Book of Abstract of Euro XXII – 22nd European Conference on Operational Research, page 193. The Association of European Operational Research Societies, July 2007.
S. Wu and P. Flach. A scored AUC Metric for Classifier Evaluation and Selection. Second Workshop on ROC Analysis in ML, Bonn, Germany, August 11, 2005.
J. Huang and C.X. Ling. Using AUC and Accuracy in Evaluating Learning Algorithms. IEEE Transactions on Knowledge and Data Engineering vol. 17, no. 3, pp. 299–310, 2005.
B. Henderson. The experience curve reviewed: IV the growth share matrix or product portfolio, 1973. Also available at URL http://www.bcg.com/publications/files/Experience_Curve_IV_Growth_Share_Matrix_1973.pdf
C.L. Blake and C.J. Merz. UCI repository of machine learning databases. University of California, Irvine, Department of Information and Computer Sciences, 1998. Also available at http://www.ics.uci.edu/mlearn/MLRepository.html
I.H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. 2nd Edition, Morgan Kaufmann, San Francisco, 2005. URL http://www.cs.waikato.ac.nz/ml/.
R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Felici, G., Simeone, B., Spinelli, V. (2010). Classification Techniques and Error Control in Logic Mining. In: Stahlbock, R., Crone, S., Lessmann, S. (eds) Data Mining. Annals of Information Systems, vol 8. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-1280-0_5
Download citation
DOI: https://doi.org/10.1007/978-1-4419-1280-0_5
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-1279-4
Online ISBN: 978-1-4419-1280-0
eBook Packages: Computer ScienceComputer Science (R0)