Abstract
Logical analysis of data (LAD) is a rule-based data mining algorithm using combinatorial optimization and boolean logic for binary classification. The goal is to construct a classification model consisting of logical patterns (rules) that capture structured information from observations. Among the four steps of LAD framework (binarization, feature selection, pattern generation, and model construction), pattern generation has been considered the most important step. Combinatorial enumeration approaches to generate all possible patterns were mostly studied in the literature; however, those approaches suffered from the computational complexity of pattern generation that grows exponentially with data (feature) size. To overcome the problem, recent studies proposed column generation-based approaches to improve the efficacy of building a LAD model with a maximum-margin objective. There was still a difficulty in solving subproblems efficiently to generate patterns. In this study, a new column generation framework is proposed, in which a new mixed-integer linear programming approach is developed to generate multiple patterns having maximum coverage in subproblems at each iteration. In addition to the maximum-margin objective, we propose an alternative objective (minimum-pattern) to solve the LAD problem as a minimum set covering problem. The proposed approaches are evaluated on the datasets from the University of California Irvine Machine Learning Repository. The computational experiments provide comparable performances compared with previous LAD and other well-known classification algorithms.
Similar content being viewed by others
References
Alexe, S., Blackstone, E., Hammer, P. L., Ishwaran, H., Lauer, M. S., & Snader, C. E. P. (2003). Coronary risk prediction by logical analysis of data. Annals of Operations Research, 19(1–4), 15–42.
Alexe, G., Alexe, S., Liotta, L. A., Petricoin, E., Reiss, M., & Hammer, P. L. (2004). Ovarian cancer detection by logical analysis of proteomic data. Proteomics, 4(3), 766–783.
Alexe, G., Alexe, S., Axelrod, D. E., Hammer, P. L., & Weissmann, D. (2005). Logical analysis of diffuse large B-cell lymphomas. Artificial Intelligence in Medicine, 34(3), 235–267.
Alexe, G., & Hammer, P. L. (2006). Spanned patterns for the logical analysis of data. Discrete Applied Mathematics, 154(7), 203–225.
Alexe, S., & Hammer, P. L. (2006). Accelerated algorithm for pattern detection in logical analysis of data. Discrete Applied Mathematics, 154(7), 1050–1063.
Alexe, G., Alexe, S., Bonates, T. O., & Kogan, A. (2007). Logical analysis of data—the vision of Peter L. Hammer. Annals of Operations Research, 149(1–4), 265–312.
Alexe, G., & Hammer, P. L. (2007). Pattern-based discriminants in the logical analysis of data. Data Mining in Biomedicine, 7, 3–23.
Alexe, G., Alexe, S., Hammer, P. L., & Kogan, A. (2008). Comprehensive vs. comprehensible classifiers in logical analysis of data. Discrete Applied Mathematics, 156(6), 870–882.
Barnhart, C., Johnson, E. L., & Nemhauser, G. L. (1998). Branch-and-price: Column generation for solving huge integer programs. Operations Research, 46(3), 316–329.
Bonates, T.O. (2007). Optimization in logical analysis of data. PhD dissertation, Rutgers University, RUTCOR.
Bonates, T. O., Hammer, P. L., & Kogan, A. (2008). Maximum patterns in datasets. Discrete Applied Mathematics, 156(6), 846–861.
Boros, E., Hammer, P. L., Ibaraki, T., Kogan, A., Mayoraz, E., & Muchnik, I. (2000). An implementation of logical analysis of data. IEEE Transactions on Knowledge and Data Engineering, 12(2), 292–306.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Crama, Y., Hammer, P. L., & Ibaraki, T. (1988). Cause–effect relationships and partially defined boolean functions. Annals of Operations Research, 16(1), 299–325.
Demiriz, A., Bennett, K. P., & Shawe-Taylor, J. (2002). Linear programming boosting via column generation. Machine Learning, 46, 225–254.
Dolan, E. D., & More, J. J. (2002). Benchmarking optimization software with performance profiles. Mathematical Programming, 91(2), 201–213.
Eckstein, J., Hammer, P. L., Liu, Y., Nediak, M., & Simeone, B. (2002). The maximum box problem and its application to data analysis. Computational Optimization and Applications, 23(3), 285–298.
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The weka data mining software: An update; sigkdd explorations. SIGKDD Explorations, 11(1), 11–18.
Hammer, P.L. (1986). The logic of cause–effect relationships. Lecture at the International Conference on Multi-Attribute Decision Making via Operations Research-based Expert systems, Passau, Germany.
Hammer, P. L., Kogan, A., Simeone, B., & Szedmák, S. (2004). Pareto-optimal patterns in logical analysis of data. Discrete Applied Mathematics, 144(1), 79–102.
Hammer, P. L., Kogan, A., & Lejeune, M. A. (2010). Reverse-engineering country risk ratings: a combinatorial non-recursive model. Annals of Operations Research, 188(1), 185–213.
Hansen, P., & Meyer, C. (2011). A new column generation algorithm for logical analysis of data. Annals of Operations Research, 188, 215–249.
Haykin, S. (1998). Applied logistic regression. Englewood Cliffs, NJ: Prentice Hall.
Hosmer, D. W., & Lemeshow, S. (1989). Applied logistic regression. New York: Wiley.
Kogan, A., & Lejeune, M. A. (2010). Combinatorial methods for constructing credit risk ratings. In C.-F. Lee, A. C. Lee, & J. Lee (Eds.), Handbook of quantitative finance and risk management (pp. 639–664). New York: Springer.
Lauer, M. S., Alexe, S., Snader, C. E. P., Blackstone, E. H., Ishwaran, H., & Hammer, P. L. (2002). Use of the logical analysis of data method for assessing long-term mortality risk after exercise electrocardiography. Circulation, 106, 590–685.
Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml.
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Burlington, MA: Morgan Kaufmann.
Reddy, A., Wang, H., Yu, H., Bonates, T.O., Gulabani, V., Azok, J., et al. (2008). Logical analysis of data (lad) model for the early diagnosis of acute ischemic stroke. BMC Medical Informatics and Decision Making, 8(30).
Ryoo, H. S., & Jang, I. Y. (2009). MILP approach to pattern generation in logical analysis of data. Discrete Applied Mathematics, 157(4), 749–761.
Schökopf, B., & Smola, A. J. (2002). Learning with kernels: Support vector machines, regularization, optimization, and beyond. Cambridge, MA: MIT Press.
Acknowledgments
The authors especially thank Endre Boros, Myong-Kee Jeong, and Gianluca Gazzola from RUTCOR at Rutgers University for sharing their insightful thoughts and the reviewers’ editorial corrections.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research is supported by in part the SUNY Research Foundation Grant (I920247) and the National Science Foundation Grant (CCF-0546574). The second author gratefully acknowledges the partial financial support of CNPq, the Brazilian Council for Scientific and Technological Development.
Rights and permissions
About this article
Cite this article
Chou, CA., Bonates, T.O., Lee, C. et al. Multi-pattern generation framework for logical analysis of data. Ann Oper Res 249, 329–349 (2017). https://doi.org/10.1007/s10479-015-1867-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-015-1867-8