Skip to main content
Log in

Multi-pattern generation framework for logical analysis of data

  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

Logical analysis of data (LAD) is a rule-based data mining algorithm using combinatorial optimization and boolean logic for binary classification. The goal is to construct a classification model consisting of logical patterns (rules) that capture structured information from observations. Among the four steps of LAD framework (binarization, feature selection, pattern generation, and model construction), pattern generation has been considered the most important step. Combinatorial enumeration approaches to generate all possible patterns were mostly studied in the literature; however, those approaches suffered from the computational complexity of pattern generation that grows exponentially with data (feature) size. To overcome the problem, recent studies proposed column generation-based approaches to improve the efficacy of building a LAD model with a maximum-margin objective. There was still a difficulty in solving subproblems efficiently to generate patterns. In this study, a new column generation framework is proposed, in which a new mixed-integer linear programming approach is developed to generate multiple patterns having maximum coverage in subproblems at each iteration. In addition to the maximum-margin objective, we propose an alternative objective (minimum-pattern) to solve the LAD problem as a minimum set covering problem. The proposed approaches are evaluated on the datasets from the University of California Irvine Machine Learning Repository. The computational experiments provide comparable performances compared with previous LAD and other well-known classification algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Alexe, S., Blackstone, E., Hammer, P. L., Ishwaran, H., Lauer, M. S., & Snader, C. E. P. (2003). Coronary risk prediction by logical analysis of data. Annals of Operations Research, 19(1–4), 15–42.

    Article  Google Scholar 

  • Alexe, G., Alexe, S., Liotta, L. A., Petricoin, E., Reiss, M., & Hammer, P. L. (2004). Ovarian cancer detection by logical analysis of proteomic data. Proteomics, 4(3), 766–783.

    Article  Google Scholar 

  • Alexe, G., Alexe, S., Axelrod, D. E., Hammer, P. L., & Weissmann, D. (2005). Logical analysis of diffuse large B-cell lymphomas. Artificial Intelligence in Medicine, 34(3), 235–267.

    Article  Google Scholar 

  • Alexe, G., & Hammer, P. L. (2006). Spanned patterns for the logical analysis of data. Discrete Applied Mathematics, 154(7), 203–225.

    Google Scholar 

  • Alexe, S., & Hammer, P. L. (2006). Accelerated algorithm for pattern detection in logical analysis of data. Discrete Applied Mathematics, 154(7), 1050–1063.

    Article  Google Scholar 

  • Alexe, G., Alexe, S., Bonates, T. O., & Kogan, A. (2007). Logical analysis of data—the vision of Peter L. Hammer. Annals of Operations Research, 149(1–4), 265–312.

    Google Scholar 

  • Alexe, G., & Hammer, P. L. (2007). Pattern-based discriminants in the logical analysis of data. Data Mining in Biomedicine, 7, 3–23.

    Google Scholar 

  • Alexe, G., Alexe, S., Hammer, P. L., & Kogan, A. (2008). Comprehensive vs. comprehensible classifiers in logical analysis of data. Discrete Applied Mathematics, 156(6), 870–882.

    Article  Google Scholar 

  • Barnhart, C., Johnson, E. L., & Nemhauser, G. L. (1998). Branch-and-price: Column generation for solving huge integer programs. Operations Research, 46(3), 316–329.

    Article  Google Scholar 

  • Bonates, T.O. (2007). Optimization in logical analysis of data. PhD dissertation, Rutgers University, RUTCOR.

  • Bonates, T. O., Hammer, P. L., & Kogan, A. (2008). Maximum patterns in datasets. Discrete Applied Mathematics, 156(6), 846–861.

    Article  Google Scholar 

  • Boros, E., Hammer, P. L., Ibaraki, T., Kogan, A., Mayoraz, E., & Muchnik, I. (2000). An implementation of logical analysis of data. IEEE Transactions on Knowledge and Data Engineering, 12(2), 292–306.

    Article  Google Scholar 

  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.

    Article  Google Scholar 

  • Crama, Y., Hammer, P. L., & Ibaraki, T. (1988). Cause–effect relationships and partially defined boolean functions. Annals of Operations Research, 16(1), 299–325.

    Article  Google Scholar 

  • Demiriz, A., Bennett, K. P., & Shawe-Taylor, J. (2002). Linear programming boosting via column generation. Machine Learning, 46, 225–254.

    Article  Google Scholar 

  • Dolan, E. D., & More, J. J. (2002). Benchmarking optimization software with performance profiles. Mathematical Programming, 91(2), 201–213.

    Article  Google Scholar 

  • Eckstein, J., Hammer, P. L., Liu, Y., Nediak, M., & Simeone, B. (2002). The maximum box problem and its application to data analysis. Computational Optimization and Applications, 23(3), 285–298.

    Article  Google Scholar 

  • Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The weka data mining software: An update; sigkdd explorations. SIGKDD Explorations, 11(1), 11–18.

    Article  Google Scholar 

  • Hammer, P.L. (1986). The logic of cause–effect relationships. Lecture at the International Conference on Multi-Attribute Decision Making via Operations Research-based Expert systems, Passau, Germany.

  • Hammer, P. L., Kogan, A., Simeone, B., & Szedmák, S. (2004). Pareto-optimal patterns in logical analysis of data. Discrete Applied Mathematics, 144(1), 79–102.

    Article  Google Scholar 

  • Hammer, P. L., Kogan, A., & Lejeune, M. A. (2010). Reverse-engineering country risk ratings: a combinatorial non-recursive model. Annals of Operations Research, 188(1), 185–213.

    Article  Google Scholar 

  • Hansen, P., & Meyer, C. (2011). A new column generation algorithm for logical analysis of data. Annals of Operations Research, 188, 215–249.

    Article  Google Scholar 

  • Haykin, S. (1998). Applied logistic regression. Englewood Cliffs, NJ: Prentice Hall.

    Google Scholar 

  • Hosmer, D. W., & Lemeshow, S. (1989). Applied logistic regression. New York: Wiley.

    Google Scholar 

  • Kogan, A., & Lejeune, M. A. (2010). Combinatorial methods for constructing credit risk ratings. In C.-F. Lee, A. C. Lee, & J. Lee (Eds.), Handbook of quantitative finance and risk management (pp. 639–664). New York: Springer.

    Chapter  Google Scholar 

  • Lauer, M. S., Alexe, S., Snader, C. E. P., Blackstone, E. H., Ishwaran, H., & Hammer, P. L. (2002). Use of the logical analysis of data method for assessing long-term mortality risk after exercise electrocardiography. Circulation, 106, 590–685.

    Article  Google Scholar 

  • Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml.

  • Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Burlington, MA: Morgan Kaufmann.

    Google Scholar 

  • Reddy, A., Wang, H., Yu, H., Bonates, T.O., Gulabani, V., Azok, J., et al. (2008). Logical analysis of data (lad) model for the early diagnosis of acute ischemic stroke. BMC Medical Informatics and Decision Making, 8(30).

  • Ryoo, H. S., & Jang, I. Y. (2009). MILP approach to pattern generation in logical analysis of data. Discrete Applied Mathematics, 157(4), 749–761.

    Article  Google Scholar 

  • Schökopf, B., & Smola, A. J. (2002). Learning with kernels: Support vector machines, regularization, optimization, and beyond. Cambridge, MA: MIT Press.

    Google Scholar 

Download references

Acknowledgments

The authors especially thank Endre Boros, Myong-Kee Jeong, and Gianluca Gazzola from RUTCOR at Rutgers University for sharing their insightful thoughts and the reviewers’ editorial corrections.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chun-An Chou.

Additional information

This research is supported by in part the SUNY Research Foundation Grant (I920247) and the National Science Foundation Grant (CCF-0546574). The second author gratefully acknowledges the partial financial support of CNPq, the Brazilian Council for Scientific and Technological Development.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chou, CA., Bonates, T.O., Lee, C. et al. Multi-pattern generation framework for logical analysis of data. Ann Oper Res 249, 329–349 (2017). https://doi.org/10.1007/s10479-015-1867-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-015-1867-8

Keywords

Navigation