Skip to main content
Log in

A new column generation algorithm for Logical Analysis of Data

  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

We present a new column generation algorithm for the determination of a classifier in the two classes LAD (Logical Analysis of Data) model. Unlike existing algorithms who seek a classifier that at the same time maximizes the margin of correctly classified observations and minimizes the amount of violations of incorrectly classified observations, we fix the margin to a difficult-to-achieve target and minimize a piecewise convex linear function of the violation of incorrectly classified observations. Moreover a part of the training set, called control set, is reserved to select, among all feasible classifiers found by the algorithm, the one with highest performance on that set. One advantage of the proposed algorithm is that it essentially does not require any calibration. Computational results are presented that show the effectiveness of this approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ben-David, S., Eiron, N., & Long, P. M. (2003). On the difficulty of approximately maximizing agreements. Journal of Computer and System Sciences, 66(3), 496–514.

    Article  Google Scholar 

  • Bennett, K. P., & Mangasarian, O. L. (1992). Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods & Software, 1, 23–34.

    Article  Google Scholar 

  • Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford: Oxford University Press.

    Google Scholar 

  • Bonates, T. O. (2007). Optimization in logical analysis of data. PhD thesis, Rutgers. The State University of New Jersey.

  • Bonates, T. O. (2010). Large margin rule-based classifiers. In J. J. Cochran (Ed.), Wiley encyclopedia of operations research and management science (pp. 1–12). New York: Wiley.

    Google Scholar 

  • Bonates, T. O. (2007). Personnal communication.

  • Bonates, T. O., & Hammer, P. L. (2007a). A branch-and-bound algorithm for a family of pseudo-boolean optimization problems (Technical Report RRR 21-2007). Rutcor, July 2007.

  • Bonates, T. O., & Hammer, P. L. (2007b). Large margin LAD classifiers (Technical Report RRR 22-2007). Rutcor, July 2007.

  • Bonates, T. O., Hammer, P. L., & Kogan, A. (2008). Maximum patterns in datasets. Discrete Applied Mathematics, 156(6), 846–861.

    Article  Google Scholar 

  • Boros, E., Hammer, P. L., Ibaraki, T., Kogan, A., Mayoraz, E., & Muchnik, I. (2000). An implementation of logical analysis of data. IEEE Transactions on Knowledge and Data Engineering, 12(2), 292–306.

    Article  Google Scholar 

  • Bradley, P. S., & Mangasarian, O. L. (1998). Feature selection via concave minimization and support vector machines. In Proceedings of the fifteenth international conference on machine learning (pp. 82–90). San Francisco: Morgan Kaufmann.

    Google Scholar 

  • Carrizosa, E., Martin-Barragan, B., & Morales, D. R. (2010a). Binarized support vector machines. INFORMS Journal on Computing, 22(1), 154–167.

    Article  Google Scholar 

  • Carrizosa, E., Martin-Barragan, B., & Morales, D. R. (2010b). Detecting relevant variables and interactions in supervised classification. European Journal of Operational Research. doi:10.1016/j.ejor.2010.03.020. In Press.

    Google Scholar 

  • Crama, Y., Hammer, P. L., & Ibaraki, T. (1988). Cause-effect relationships and partially defined Boolean functions. Annals of Operation Research, 16(1–4), 299–325.

    Article  Google Scholar 

  • Demiriz, A., Bennett, K. P., & Shawe-Taylor, J. (2002). Linear programming boosting via column generation. Machine Learning, 46, 225–254.

    Article  Google Scholar 

  • Eckstein, J., & Goldberg, N. (2009). An improved branch-and-bound method for maximum monomial agreement (Technical Report RRR 14). Rutcor, July 2009.

  • Feldman, V., Gopalan, P., Khot, S., & Ponnuswami, A. (2009). On agnostic learning of parities, monomials and halfspaces. SIAM Journal on Computing, 39(2), 606–645.

    Article  Google Scholar 

  • Goldberg, N., & Shan, C. C. (2007). Boosting optimal logical patterns using noisy data. In Proceedings of the SIAM international conference on data mining (pp. 228–236).

    Google Scholar 

  • Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: an update. In SIGKDD Explorations (Vol. 11(1)).

  • Hammer, P. L. (1986). Partially defined boolean functions and cause-effect relationships. In Proceedings international conf. multi-attribute decision making via OR-based expert systems, Passau, 1986.

    Google Scholar 

  • Hammer, P. L., & Bonates, T. O. (2006). Logical Analysis of Data—an overview: from combinatorial optimization to medical applications. Annals of Operation Research, 148, 203–225.

    Article  Google Scholar 

  • Hammer, P. L., Kogan, A., Simeone, B., & Szedmák, S. (2004). Pareto-optimal patterns in logical analysis of data. Discrete Applied Mathematics, 144(1–2), 79–102.

    Article  Google Scholar 

  • ILOG, CPLEX 10.1.1 documentation (2006). Ilog Cplex Optimization Inc.

  • Kearns, M. J., Schapire, R. E., & Sellie, L. M. (1994). Toward efficient agnostic learning. Machine Learning, 17, 115–141.

    Google Scholar 

  • Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the 14th international joint conference on artificial intelligence (IJCAI) (pp. 1137–1143).

    Google Scholar 

  • Ladtools. http://rutcor.rutgers.edu/pub/LAD/c.

  • Mangasarian, O. L. (2005). Support vector machine classification via parameterless robust linear programming. Optimization Methods & Software, 20(1), 115–125.

    Article  Google Scholar 

  • Martin-Barragan, B. (2006). Mathematical programming for support vector machines. PhD thesis, Universidad de Sevilla.

  • Mayoraz, E. (1996). C++ tools for logical analysis of data. Technical Report RTR 1-95, Rutgers University, July 1995. revised June 1996.

  • Newman, D., Hettich, S., Blake, C., & Merz, C. (1998). UCI repository of machine learning databases.

  • Prechelt, L. (1998). Early stopping—but when? In G. Orr & K.-R. Müller (Eds.), Lecture notes in computer science: Vol. 1524. Neural networks: tricks of the trade (pp. 55–69). Berlin: Springer.

    Chapter  Google Scholar 

  • Ryoo, H. S., & Jang, I.-Y. (2009). MILP approach to pattern generation in logical analysis of data. Discrete Applied Mathematics, 157(4), 749–761.

    Article  Google Scholar 

  • Schapire, R. E., & Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3), 297–336.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pierre Hansen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hansen, P., Meyer, C. A new column generation algorithm for Logical Analysis of Data. Ann Oper Res 188, 215–249 (2011). https://doi.org/10.1007/s10479-011-0850-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-011-0850-2

Keywords

Navigation