Abstract
The paper is concerned with the problem of binary classification of data records, given an already classified training set of records. Among the various approaches to the problem, the methodology of the logical analysis of data (LAD) is considered. Such approach is based on discrete mathematics, with special emphasis on Boolean functions. With respect to the standard LAD procedure, enhancements based on probability considerations are presented. In particular, the problem of the selection of the optimal support set is formulated as a weighted set covering problem. Testable statistical hypothesis are used. Accuracy of the modified LAD procedure is compared to that of the standard LAD procedure on datasets of the UCI repository. Encouraging results are obtained and discussed.
Similar content being viewed by others
References
Alexe, G., S. Alexe, P.L. Hammer, and A. Kogan. (2002). “Comprehensive vs. Comprehensible Classifiers in Logical Analysis of Data.” RUTCOR Research Report, RRR 9-2002; DIMACS Technical Report 2002-49; Annals of Operations Research (in print).
Almuallim, H. and T.G. Dietterich. (1994). “Learning Boolean Concepts in the Presence of many Irrelevant Features.” Artificial Intelligence, 69(1), 279–306.
Blake, C. and C.J. Merz. (1998). “UCI Repository of Machine Learning Databases.” Irvine, CA: University of California, Department of Information and Computer Science. URL: http://www.ics.uci.edu/∼mlearn/MLRepository.html.
Boros, E., P.L. Hammer, T. Ibaraki, and A. Kogan. (1997). “Logical Analysis of Numerical Data.” Mathematical Programming, 79, 163–190.
Boros, E., P.L. Hammer, T. Ibaraki, A. Kogan, E. Mayoraz, and I. Muchnik. (2000a). “An Implementation of Logical Analysis of Data.” IEEE Transactions on Knowledge and Data Engineering, 12(2), 292–306.
Boros, E., T. Horiyama, T. Ibaraki, K. Makino, and M. Yagiura. (2000b). “Finding Essential Attributes from Binary Data.” RUTCOR Research Report, RRR 13-2000, Annals of Mathematics and Artificial Intelligence (to appear).
Brodley, C.E. and P.E. Utgoff. (1995). “Multivariate Decision Trees.” Machine Learning, 19, 45–77.
Crama, Y., P.L. Hammer, and T. Ibaraki. (1988). “Cause-effect Relationships and Partially Defined Boolean Functions.” Annals of Operations Research, 16, 299–326.
Fisher, M.L. (1981). “The Lagrangian Relaxation Method for Solving Integer Programming Problems.” Management Science, 27, 1–18.
Eklund, P.W. (2002). “A Performance Survey of Public Domain Supervised Machine Learning Algorithms.” KVO Technical Report 2002, The University of Queensland, submitted.
Evans, M., N. Hastings, and B. Peacock. (2000). Statistical Distributions (3rd Edn). Wiley series in Probability and Statistics, New York.
ILOG Cplex 8.0. (2002). Reference Manual. France: ILOG.
Hammer, P.L., A. Kogan, B. Simeone, and S. Szedmak. (2001). “Pareto-Optimal Patterns in Logical Analysis of Data.” RUTCOR Research Report, RRR 7-2001, Discrete Applied Mathematics (in print).
Hand, D.J., H. Mannila, and P. Smyth. (2001). Principles of Data Mining. London: MIT Press.
Hastie, T., R. Tibshirani, and J. Friedman. (2002). The Elements of Statistical Learning. New York, Berlin, Heidelberg: Springer-Verlag.
Lee, Y.J. and O.L. Mangasarian. (2001). “SSVM: A Smooth Support Vector Machine for Classification.” Computational Optimization and Applications, 20(1), 5–22.
Mitchell, T.M. (1997). Machine Learning. Singapore: McGraw-Hill.
Nemhauser, G.L. and L.A. Wolsey. (1988). Integer and Combinatorial Optimization. New York: J. Wiley.
Nevprop3. (1996). Nevprop3 User Manual (Nevada backPropagation, Version 3). Dept. of Internal Medicine, Electrical Engineering, and Computer Science, University of Nevada, Reno.
Press, W.H., S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery. (1992). Numerical Recipes in C: The Art of Scientific Computing (2nd Edn) Cambridge University Press.
Quinlan, J.R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.
Ramakrishnan, R. and J. Gehrke. (2000). Database Management System. McGraw Hill.
Schrijver, A. (1986). Theory of Linear and Integer Programming. New York: Wiley.
Utgoff, P.E., N.C. Berkman, and J.A. Clouse. (1997). “Decision Tree Induction Based on Efficient Tree Restructuring.” Machine Learning, 29(1), 5–44.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bruni, R. Reformulation of the support set selection problem in the logical analysis of data. Ann Oper Res 150, 79–92 (2007). https://doi.org/10.1007/s10479-006-0159-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-006-0159-8