Skip to main content

Classification Techniques and Error Control in Logic Mining

  • Chapter
  • First Online:
Data Mining

Part of the book series: Annals of Information Systems ((AOIS,volume 8))

Abstract

In this chapter we consider box clustering, a method for supervised classification that partitions the feature space with particularly simple convex sets (boxes). Box clustering produces systems of logic rules obtained from data in numerical form. Such rules explicitly represent the logic relations hidden in the data w.r.t. a target class. The algorithm adopted to solve the box clustering problem is based on a simple and fast agglomerative method which can be affected by the initial choice of the starting point and by the rules adopted by the method. In this chapter we propose and motivate a randomized approach that generates a large number of candidate models using different data samples and then chooses the best candidate model according to two criteria: model size, as expressed by the number of boxes of the model, and model precision, as expressed by the error on the test split. We adopt a Pareto-optimal strategy for the choice of the solution, under the hypothesis that such a choice would identify simple models with good predictive power. This procedure has been applied to a wide range of well-known data sets to evaluate to what extent our results confirm this hypothesis; its performances are then compared with those of competing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M.A. Davenport, R. G. Baraniuk, and C. D. Scott. Learning minimum volume sets with support vector machines. IEEE International Workshop on Machine Learning for Signal Processing (MLSP), Maynooth, Ireland, 2006.

    Google Scholar 

  2. M.A. Davenport, R. G. Baraniuk, and C. D. Scott. Controlling false alarms with support vector machines. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Toulouse, France, 2006.

    Google Scholar 

  3. Neyman-Pearson SVMs: www.ece.rice.edu/md/np_svm.php/

  4. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth International, Belmont, CA, 1984.

    Google Scholar 

  5. S. Foldes, P.L. Hammer. Disjunctive and Conjunctive Normal Forms of Pseudo-Boolean Functions. RUTCOR Research Report, RRR 1-2000, Also available at http://rutcor.rutgers.edu/pub/rrr/reports2000/01_2000.ps

  6. G. Felici and K. Truemper. A Minsat Approach for Learning in Logic Domains. INFORMS Journal on Computing, 13 (3), 2001, 1–17.

    Google Scholar 

  7. G. Felici, F-S. Sun, and K. Truemper. Learning Logic Formulas and Related Error Distributions, in Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques, G. Felici and E. Trintaphyllou eds., Springer Science, New York.

    Google Scholar 

  8. G. Alexe, P.L. Hammer, P.L. Kogan. Comprehensive vs. Comprehensible Classifiers in Logical Analysis of Data. RUTCOR Research Report, RRR 9-2002. Also available at http://rutcor.rutgers.edu/pub/rrr/reports2002/40_2002.pdf

  9. E. Boros, P.L. Hammer, T. Ibaraki, A. Kogan, E. Mayoraz, and I. Muchnik. An implementation of logical analysis of data. IEEE Transactions on Knowledge and Data Engineering, 12 (2): 292–306, November 2000.

    Google Scholar 

  10. K. Truemper. Lsquare System for Learning Logic. University of Texas at Dallas, Computer Science Program, April 1999.

    Google Scholar 

  11. E. Triantaphyllou. The OCAT approach for data mining and knowledge discovery. Working Paper, IMSE Department, Louisiana State University, Baton Rouge, LA 70803-6409, USA, 2001.

    Google Scholar 

  12. E. Triantaphyllou. The One Clause At a Time (OCAT) Approach to Data Mining and Knowledge Discovery, in Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques, G. Felici and E. Trintaphyllou eds., Springer, Heidelberg, Germany, Chapter 2, pp. 45–87, 2005.

    Google Scholar 

  13. S. Bartnikowsi, M. Granberry, and J. Mugan. Transformation of rational data and set data to logic data, in Data Mining & Knowledge Discovery Based on Rule Induction Techniques. Massive Computing, Springer Science, 12 (5) : 253–278, November 2006.

    Google Scholar 

  14. P.L. Hammer, I.I. Lozina. Boolean Separators and Approximate Boolean Classifiers. RUTCOR Research Report, RRR 14-2006. Also available at http://rutcor.rutgers.edu/pub/rrr/reports2006/14_2006.pdf

  15. E. Boros, P.L. Hammer, T. Ibaraki, A. Kogan. Logical Analysis of Numerical Data. Mathematical Programming, 79: 163–190, 1997.

    Google Scholar 

  16. Y. Crama, P.L. Hammer, T. Ibaraki. Cause-effect relationships and partially defined Boolean functions. Annals of Operations Research, 16 : 299–325, 1988.

    Article  Google Scholar 

  17. P.L. Hammer. Partially defined Boolean functions and cause-effect relationships. Lecture at the International Conference on Multi-Attribute Decision Making Via Or-Based Expert Systems, University of Passau, Germany, April 1986.

    Google Scholar 

  18. O. Ekin, P.L. Hammer, A. Kogan. Convexity and Logical Analysis of Data. RUTCOR Research Report, RRR 5–1998. Also available at http://rutcor.rutgers.edu/pub/rrr/reports1998/05.ps.

  19. B. Simeone and V. Spinelli. The optimization problem framework for box clustering approach in logic mining. Book of Abstract of Euro XXII – 22nd European Conference on Operational Research, page 193. The Association of European Operational Research Societies, July 2007.

    Google Scholar 

  20. E. Boros, T. Ibaraki, L. Shi, M. Yagiura. Generating all ‘good’ patterns in polynomial expected time. Lecture at the 6th International Symposium on Artificial Intelligence and Mathematics, Ft. Lauderdale, Florida, January 2000.

    Google Scholar 

  21. P.L. Hammer, A. Kogan, B. Simeone, S. Szedmak. Pareto-optimal patterns in logical analysis of data. Discrete Applied Mathematics 144: 79–102, 2004. Also available at http://rutcor.rutgers.edu/pub/rrr/reports2001/07.pdf.

  22. T.O. Bonates, P.L. Hammer, P.L. Kogan. Maximum Patterns in Datasets. RUTCOR Research Report, RRR 9-2006. Also available at http://rutcor.rutgers.edu/pub/rrr/reports2006/9_2006.pdf

  23. J. Eckstein, P.L. Hammer, Y. Liu, M. Nediak, and B. Simeone. The maximum box problem and its application to data analysis. Computational Optimization and Application, 23: 285–298, 2002.

    Article  Google Scholar 

  24. P.L. Hammer, Y. Liu, S. Szedmák, and B. Simeone. Saturated systems of homogeneous boxes and the logical analysis of numerical data. Discrete Applied Mathematics, Volume 144, 1–2: 103–109, 2004.

    Article  Google Scholar 

  25. B. Simeone, G. Felici, and V. Spinelli. A graph coloring approach for box clustering techniques in logic mining. Book of Abstract of Euro XXII – 22nd European Conference on Operational Research, page 193. The Association of European Operational Research Societies, July 2007.

    Google Scholar 

  26. S. Wu and P. Flach. A scored AUC Metric for Classifier Evaluation and Selection. Second Workshop on ROC Analysis in ML, Bonn, Germany, August 11, 2005.

    Google Scholar 

  27. J. Huang and C.X. Ling. Using AUC and Accuracy in Evaluating Learning Algorithms. IEEE Transactions on Knowledge and Data Engineering vol. 17, no. 3, pp. 299–310, 2005.

    Article  Google Scholar 

  28. B. Henderson. The experience curve reviewed: IV the growth share matrix or product portfolio, 1973. Also available at URL http://www.bcg.com/publications/files/Experience_Curve_IV_Growth_Share_Matrix_1973.pdf

  29. C.L. Blake and C.J. Merz. UCI repository of machine learning databases. University of California, Irvine, Department of Information and Computer Sciences, 1998. Also available at http://www.ics.uci.edu/mlearn/MLRepository.html

  30. I.H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. 2nd Edition, Morgan Kaufmann, San Francisco, 2005. URL http://www.cs.waikato.ac.nz/ml/.

  31. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Giovanni Felici , Bruno Simeone or Vincenzo Spinelli .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Felici, G., Simeone, B., Spinelli, V. (2010). Classification Techniques and Error Control in Logic Mining. In: Stahlbock, R., Crone, S., Lessmann, S. (eds) Data Mining. Annals of Information Systems, vol 8. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-1280-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-1280-0_5

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-1279-4

  • Online ISBN: 978-1-4419-1280-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics