A new column generation algorithm for Logical Analysis of Data

Hansen, Pierre; Meyer, Christophe

doi:10.1007/s10479-011-0850-2

A new column generation algorithm for Logical Analysis of Data

Published: 10 February 2011

Volume 188, pages 215–249, (2011)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

Pierre Hansen¹ &
Christophe Meyer²

200 Accesses
15 Citations
Explore all metrics

Abstract

We present a new column generation algorithm for the determination of a classifier in the two classes LAD (Logical Analysis of Data) model. Unlike existing algorithms who seek a classifier that at the same time maximizes the margin of correctly classified observations and minimizes the amount of violations of incorrectly classified observations, we fix the margin to a difficult-to-achieve target and minimize a piecewise convex linear function of the violation of incorrectly classified observations. Moreover a part of the training set, called control set, is reserved to select, among all feasible classifiers found by the algorithm, the one with highest performance on that set. One advantage of the proposed algorithm is that it essentially does not require any calibration. Computational results are presented that show the effectiveness of this approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Ben-David, S., Eiron, N., & Long, P. M. (2003). On the difficulty of approximately maximizing agreements. Journal of Computer and System Sciences, 66(3), 496–514.
Article Google Scholar
Bennett, K. P., & Mangasarian, O. L. (1992). Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods & Software, 1, 23–34.
Article Google Scholar
Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford: Oxford University Press.
Google Scholar
Bonates, T. O. (2007). Optimization in logical analysis of data. PhD thesis, Rutgers. The State University of New Jersey.
Bonates, T. O. (2010). Large margin rule-based classifiers. In J. J. Cochran (Ed.), Wiley encyclopedia of operations research and management science (pp. 1–12). New York: Wiley.
Google Scholar
Bonates, T. O. (2007). Personnal communication.
Bonates, T. O., & Hammer, P. L. (2007a). A branch-and-bound algorithm for a family of pseudo-boolean optimization problems (Technical Report RRR 21-2007). Rutcor, July 2007.
Bonates, T. O., & Hammer, P. L. (2007b). Large margin LAD classifiers (Technical Report RRR 22-2007). Rutcor, July 2007.
Bonates, T. O., Hammer, P. L., & Kogan, A. (2008). Maximum patterns in datasets. Discrete Applied Mathematics, 156(6), 846–861.
Article Google Scholar
Boros, E., Hammer, P. L., Ibaraki, T., Kogan, A., Mayoraz, E., & Muchnik, I. (2000). An implementation of logical analysis of data. IEEE Transactions on Knowledge and Data Engineering, 12(2), 292–306.
Article Google Scholar
Bradley, P. S., & Mangasarian, O. L. (1998). Feature selection via concave minimization and support vector machines. In Proceedings of the fifteenth international conference on machine learning (pp. 82–90). San Francisco: Morgan Kaufmann.
Google Scholar
Carrizosa, E., Martin-Barragan, B., & Morales, D. R. (2010a). Binarized support vector machines. INFORMS Journal on Computing, 22(1), 154–167.
Article Google Scholar
Carrizosa, E., Martin-Barragan, B., & Morales, D. R. (2010b). Detecting relevant variables and interactions in supervised classification. European Journal of Operational Research. doi:10.1016/j.ejor.2010.03.020. In Press.
Google Scholar
Crama, Y., Hammer, P. L., & Ibaraki, T. (1988). Cause-effect relationships and partially defined Boolean functions. Annals of Operation Research, 16(1–4), 299–325.
Article Google Scholar
Demiriz, A., Bennett, K. P., & Shawe-Taylor, J. (2002). Linear programming boosting via column generation. Machine Learning, 46, 225–254.
Article Google Scholar
Eckstein, J., & Goldberg, N. (2009). An improved branch-and-bound method for maximum monomial agreement (Technical Report RRR 14). Rutcor, July 2009.
Feldman, V., Gopalan, P., Khot, S., & Ponnuswami, A. (2009). On agnostic learning of parities, monomials and halfspaces. SIAM Journal on Computing, 39(2), 606–645.
Article Google Scholar
Goldberg, N., & Shan, C. C. (2007). Boosting optimal logical patterns using noisy data. In Proceedings of the SIAM international conference on data mining (pp. 228–236).
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: an update. In SIGKDD Explorations (Vol. 11(1)).
Hammer, P. L. (1986). Partially defined boolean functions and cause-effect relationships. In Proceedings international conf. multi-attribute decision making via OR-based expert systems, Passau, 1986.
Google Scholar
Hammer, P. L., & Bonates, T. O. (2006). Logical Analysis of Data—an overview: from combinatorial optimization to medical applications. Annals of Operation Research, 148, 203–225.
Article Google Scholar
Hammer, P. L., Kogan, A., Simeone, B., & Szedmák, S. (2004). Pareto-optimal patterns in logical analysis of data. Discrete Applied Mathematics, 144(1–2), 79–102.
Article Google Scholar
ILOG, CPLEX 10.1.1 documentation (2006). Ilog Cplex Optimization Inc.
Kearns, M. J., Schapire, R. E., & Sellie, L. M. (1994). Toward efficient agnostic learning. Machine Learning, 17, 115–141.
Google Scholar
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the 14th international joint conference on artificial intelligence (IJCAI) (pp. 1137–1143).
Google Scholar
Ladtools. http://rutcor.rutgers.edu/pub/LAD/c.
Mangasarian, O. L. (2005). Support vector machine classification via parameterless robust linear programming. Optimization Methods & Software, 20(1), 115–125.
Article Google Scholar
Martin-Barragan, B. (2006). Mathematical programming for support vector machines. PhD thesis, Universidad de Sevilla.
Mayoraz, E. (1996). C++ tools for logical analysis of data. Technical Report RTR 1-95, Rutgers University, July 1995. revised June 1996.
Newman, D., Hettich, S., Blake, C., & Merz, C. (1998). UCI repository of machine learning databases.
Prechelt, L. (1998). Early stopping—but when? In G. Orr & K.-R. Müller (Eds.), Lecture notes in computer science: Vol. 1524. Neural networks: tricks of the trade (pp. 55–69). Berlin: Springer.
Chapter Google Scholar
Ryoo, H. S., & Jang, I.-Y. (2009). MILP approach to pattern generation in logical analysis of data. Discrete Applied Mathematics, 157(4), 749–761.
Article Google Scholar
Schapire, R. E., & Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3), 297–336.
Article Google Scholar

Download references

Author information

Authors and Affiliations

GERAD & Méthodes Quantitatives de Gestion, HEC Montréal, 3000 chemin de la Côte-Sainte-Catherine, Montreal, Quebec, H3T 2A7, Canada
Pierre Hansen
GERAD, HEC Montréal, 3000 chemin de la Côte-Sainte-Catherine, Montreal, Quebec, H3T 2A7, Canada
Christophe Meyer

Authors

Pierre Hansen
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Meyer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pierre Hansen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hansen, P., Meyer, C. A new column generation algorithm for Logical Analysis of Data. Ann Oper Res 188, 215–249 (2011). https://doi.org/10.1007/s10479-011-0850-2

Download citation

Published: 10 February 2011
Issue Date: August 2011
DOI: https://doi.org/10.1007/s10479-011-0850-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new column generation algorithm for Logical Analysis of Data

Abstract

Access this article

Similar content being viewed by others

Multi-pattern generation framework for logical analysis of data

Classifier-based constraint acquisition

Optimal column subset selection for image classification by genetic algorithms

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A new column generation algorithm for Logical Analysis of Data

Abstract

Access this article

Similar content being viewed by others

Multi-pattern generation framework for logical analysis of data

Classifier-based constraint acquisition

Optimal column subset selection for image classification by genetic algorithms

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation