Skip to main content

Advertisement

Log in

Reformulation of the support set selection problem in the logical analysis of data

  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

The paper is concerned with the problem of binary classification of data records, given an already classified training set of records. Among the various approaches to the problem, the methodology of the logical analysis of data (LAD) is considered. Such approach is based on discrete mathematics, with special emphasis on Boolean functions. With respect to the standard LAD procedure, enhancements based on probability considerations are presented. In particular, the problem of the selection of the optimal support set is formulated as a weighted set covering problem. Testable statistical hypothesis are used. Accuracy of the modified LAD procedure is compared to that of the standard LAD procedure on datasets of the UCI repository. Encouraging results are obtained and discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Alexe, G., S. Alexe, P.L. Hammer, and A. Kogan. (2002). “Comprehensive vs. Comprehensible Classifiers in Logical Analysis of Data.” RUTCOR Research Report, RRR 9-2002; DIMACS Technical Report 2002-49; Annals of Operations Research (in print).

  • Almuallim, H. and T.G. Dietterich. (1994). “Learning Boolean Concepts in the Presence of many Irrelevant Features.” Artificial Intelligence, 69(1), 279–306.

    Article  Google Scholar 

  • Blake, C. and C.J. Merz. (1998). “UCI Repository of Machine Learning Databases.” Irvine, CA: University of California, Department of Information and Computer Science. URL: http://www.ics.uci.edu/∼mlearn/MLRepository.html.

  • Boros, E., P.L. Hammer, T. Ibaraki, and A. Kogan. (1997). “Logical Analysis of Numerical Data.” Mathematical Programming, 79, 163–190.

    Article  Google Scholar 

  • Boros, E., P.L. Hammer, T. Ibaraki, A. Kogan, E. Mayoraz, and I. Muchnik. (2000a). “An Implementation of Logical Analysis of Data.” IEEE Transactions on Knowledge and Data Engineering, 12(2), 292–306.

    Article  Google Scholar 

  • Boros, E., T. Horiyama, T. Ibaraki, K. Makino, and M. Yagiura. (2000b). “Finding Essential Attributes from Binary Data.” RUTCOR Research Report, RRR 13-2000, Annals of Mathematics and Artificial Intelligence (to appear).

  • Brodley, C.E. and P.E. Utgoff. (1995). “Multivariate Decision Trees.” Machine Learning, 19, 45–77.

    Google Scholar 

  • Crama, Y., P.L. Hammer, and T. Ibaraki. (1988). “Cause-effect Relationships and Partially Defined Boolean Functions.” Annals of Operations Research, 16, 299–326.

    Article  Google Scholar 

  • Fisher, M.L. (1981). “The Lagrangian Relaxation Method for Solving Integer Programming Problems.” Management Science, 27, 1–18.

    Article  Google Scholar 

  • Eklund, P.W. (2002). “A Performance Survey of Public Domain Supervised Machine Learning Algorithms.” KVO Technical Report 2002, The University of Queensland, submitted.

  • Evans, M., N. Hastings, and B. Peacock. (2000). Statistical Distributions (3rd Edn). Wiley series in Probability and Statistics, New York.

  • ILOG Cplex 8.0. (2002). Reference Manual. France: ILOG.

    Google Scholar 

  • Hammer, P.L., A. Kogan, B. Simeone, and S. Szedmak. (2001). “Pareto-Optimal Patterns in Logical Analysis of Data.” RUTCOR Research Report, RRR 7-2001, Discrete Applied Mathematics (in print).

  • Hand, D.J., H. Mannila, and P. Smyth. (2001). Principles of Data Mining. London: MIT Press.

    Google Scholar 

  • Hastie, T., R. Tibshirani, and J. Friedman. (2002). The Elements of Statistical Learning. New York, Berlin, Heidelberg: Springer-Verlag.

    Google Scholar 

  • Lee, Y.J. and O.L. Mangasarian. (2001). “SSVM: A Smooth Support Vector Machine for Classification.” Computational Optimization and Applications, 20(1), 5–22.

    Article  Google Scholar 

  • Mitchell, T.M. (1997). Machine Learning. Singapore: McGraw-Hill.

    Google Scholar 

  • Nemhauser, G.L. and L.A. Wolsey. (1988). Integer and Combinatorial Optimization. New York: J. Wiley.

    Google Scholar 

  • Nevprop3. (1996). Nevprop3 User Manual (Nevada backPropagation, Version 3). Dept. of Internal Medicine, Electrical Engineering, and Computer Science, University of Nevada, Reno.

  • Press, W.H., S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery. (1992). Numerical Recipes in C: The Art of Scientific Computing (2nd Edn) Cambridge University Press.

  • Quinlan, J.R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Ramakrishnan, R. and J. Gehrke. (2000). Database Management System. McGraw Hill.

  • Schrijver, A. (1986). Theory of Linear and Integer Programming. New York: Wiley.

    Google Scholar 

  • Utgoff, P.E., N.C. Berkman, and J.A. Clouse. (1997). “Decision Tree Induction Based on Efficient Tree Restructuring.” Machine Learning, 29(1), 5–44.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Renato Bruni.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bruni, R. Reformulation of the support set selection problem in the logical analysis of data. Ann Oper Res 150, 79–92 (2007). https://doi.org/10.1007/s10479-006-0159-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-006-0159-8

Keywords