Multi-pattern generation framework for logical analysis of data

Chou, Chun-An; Bonates, Tibérius O.; Lee, Chungmok; Chaovalitwongse, Wanpracha Art

doi:10.1007/s10479-015-1867-8

Multi-pattern generation framework for logical analysis of data

Published: 19 April 2015

Volume 249, pages 329–349, (2017)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

Chun-An Chou¹,
Tibérius O. Bonates³,
Chungmok Lee⁴ &
…
Wanpracha Art Chaovalitwongse²

561 Accesses
8 Citations
Explore all metrics

Abstract

Logical analysis of data (LAD) is a rule-based data mining algorithm using combinatorial optimization and boolean logic for binary classification. The goal is to construct a classification model consisting of logical patterns (rules) that capture structured information from observations. Among the four steps of LAD framework (binarization, feature selection, pattern generation, and model construction), pattern generation has been considered the most important step. Combinatorial enumeration approaches to generate all possible patterns were mostly studied in the literature; however, those approaches suffered from the computational complexity of pattern generation that grows exponentially with data (feature) size. To overcome the problem, recent studies proposed column generation-based approaches to improve the efficacy of building a LAD model with a maximum-margin objective. There was still a difficulty in solving subproblems efficiently to generate patterns. In this study, a new column generation framework is proposed, in which a new mixed-integer linear programming approach is developed to generate multiple patterns having maximum coverage in subproblems at each iteration. In addition to the maximum-margin objective, we propose an alternative objective (minimum-pattern) to solve the LAD problem as a minimum set covering problem. The proposed approaches are evaluated on the datasets from the University of California Irvine Machine Learning Repository. The computational experiments provide comparable performances compared with previous LAD and other well-known classification algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Puma optimizer (PO): a novel metaheuristic optimization algorithm and its application in machine learning

Article 19 January 2024

A comprehensive survey of data mining

Article 06 February 2020

A comprehensive survey on feature selection in the various fields of machine learning

Article 23 July 2021

References

Alexe, S., Blackstone, E., Hammer, P. L., Ishwaran, H., Lauer, M. S., & Snader, C. E. P. (2003). Coronary risk prediction by logical analysis of data. Annals of Operations Research, 19(1–4), 15–42.
Article Google Scholar
Alexe, G., Alexe, S., Liotta, L. A., Petricoin, E., Reiss, M., & Hammer, P. L. (2004). Ovarian cancer detection by logical analysis of proteomic data. Proteomics, 4(3), 766–783.
Article Google Scholar
Alexe, G., Alexe, S., Axelrod, D. E., Hammer, P. L., & Weissmann, D. (2005). Logical analysis of diffuse large B-cell lymphomas. Artificial Intelligence in Medicine, 34(3), 235–267.
Article Google Scholar
Alexe, G., & Hammer, P. L. (2006). Spanned patterns for the logical analysis of data. Discrete Applied Mathematics, 154(7), 203–225.
Google Scholar
Alexe, S., & Hammer, P. L. (2006). Accelerated algorithm for pattern detection in logical analysis of data. Discrete Applied Mathematics, 154(7), 1050–1063.
Article Google Scholar
Alexe, G., Alexe, S., Bonates, T. O., & Kogan, A. (2007). Logical analysis of data—the vision of Peter L. Hammer. Annals of Operations Research, 149(1–4), 265–312.
Google Scholar
Alexe, G., & Hammer, P. L. (2007). Pattern-based discriminants in the logical analysis of data. Data Mining in Biomedicine, 7, 3–23.
Google Scholar
Alexe, G., Alexe, S., Hammer, P. L., & Kogan, A. (2008). Comprehensive vs. comprehensible classifiers in logical analysis of data. Discrete Applied Mathematics, 156(6), 870–882.
Article Google Scholar
Barnhart, C., Johnson, E. L., & Nemhauser, G. L. (1998). Branch-and-price: Column generation for solving huge integer programs. Operations Research, 46(3), 316–329.
Article Google Scholar
Bonates, T.O. (2007). Optimization in logical analysis of data. PhD dissertation, Rutgers University, RUTCOR.
Bonates, T. O., Hammer, P. L., & Kogan, A. (2008). Maximum patterns in datasets. Discrete Applied Mathematics, 156(6), 846–861.
Article Google Scholar
Boros, E., Hammer, P. L., Ibaraki, T., Kogan, A., Mayoraz, E., & Muchnik, I. (2000). An implementation of logical analysis of data. IEEE Transactions on Knowledge and Data Engineering, 12(2), 292–306.
Article Google Scholar
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Article Google Scholar
Crama, Y., Hammer, P. L., & Ibaraki, T. (1988). Cause–effect relationships and partially defined boolean functions. Annals of Operations Research, 16(1), 299–325.
Article Google Scholar
Demiriz, A., Bennett, K. P., & Shawe-Taylor, J. (2002). Linear programming boosting via column generation. Machine Learning, 46, 225–254.
Article Google Scholar
Dolan, E. D., & More, J. J. (2002). Benchmarking optimization software with performance profiles. Mathematical Programming, 91(2), 201–213.
Article Google Scholar
Eckstein, J., Hammer, P. L., Liu, Y., Nediak, M., & Simeone, B. (2002). The maximum box problem and its application to data analysis. Computational Optimization and Applications, 23(3), 285–298.
Article Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The weka data mining software: An update; sigkdd explorations. SIGKDD Explorations, 11(1), 11–18.
Article Google Scholar
Hammer, P.L. (1986). The logic of cause–effect relationships. Lecture at the International Conference on Multi-Attribute Decision Making via Operations Research-based Expert systems, Passau, Germany.
Hammer, P. L., Kogan, A., Simeone, B., & Szedmák, S. (2004). Pareto-optimal patterns in logical analysis of data. Discrete Applied Mathematics, 144(1), 79–102.
Article Google Scholar
Hammer, P. L., Kogan, A., & Lejeune, M. A. (2010). Reverse-engineering country risk ratings: a combinatorial non-recursive model. Annals of Operations Research, 188(1), 185–213.
Article Google Scholar
Hansen, P., & Meyer, C. (2011). A new column generation algorithm for logical analysis of data. Annals of Operations Research, 188, 215–249.
Article Google Scholar
Haykin, S. (1998). Applied logistic regression. Englewood Cliffs, NJ: Prentice Hall.
Google Scholar
Hosmer, D. W., & Lemeshow, S. (1989). Applied logistic regression. New York: Wiley.
Google Scholar
Kogan, A., & Lejeune, M. A. (2010). Combinatorial methods for constructing credit risk ratings. In C.-F. Lee, A. C. Lee, & J. Lee (Eds.), Handbook of quantitative finance and risk management (pp. 639–664). New York: Springer.
Chapter Google Scholar
Lauer, M. S., Alexe, S., Snader, C. E. P., Blackstone, E. H., Ishwaran, H., & Hammer, P. L. (2002). Use of the logical analysis of data method for assessing long-term mortality risk after exercise electrocardiography. Circulation, 106, 590–685.
Article Google Scholar
Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml.
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Burlington, MA: Morgan Kaufmann.
Google Scholar
Reddy, A., Wang, H., Yu, H., Bonates, T.O., Gulabani, V., Azok, J., et al. (2008). Logical analysis of data (lad) model for the early diagnosis of acute ischemic stroke. BMC Medical Informatics and Decision Making, 8(30).
Ryoo, H. S., & Jang, I. Y. (2009). MILP approach to pattern generation in logical analysis of data. Discrete Applied Mathematics, 157(4), 749–761.
Article Google Scholar
Schökopf, B., & Smola, A. J. (2002). Learning with kernels: Support vector machines, regularization, optimization, and beyond. Cambridge, MA: MIT Press.
Google Scholar

Download references

Acknowledgments

The authors especially thank Endre Boros, Myong-Kee Jeong, and Gianluca Gazzola from RUTCOR at Rutgers University for sharing their insightful thoughts and the reviewers’ editorial corrections.

Author information

Authors and Affiliations

Department of Systems Science and Industrial Engineering, SUNY Binghamton, Vestal, NY, USA
Chun-An Chou
Departments of Industrial and Systems Engineering and Radiology, University of Washington, Seattle, WA, USA
Wanpracha Art Chaovalitwongse
Departament of Statistics and Applied Mathematics, Federal University of Ceara, Fortaleza, CE, Brazil
Tibérius O. Bonates
Department of Industrial and Management Engineering, Hankuk University of Foreign Studies, Yongin, Gyeonggi-do, 449-791, Republic of Korea
Chungmok Lee

Authors

Chun-An Chou
View author publications
You can also search for this author in PubMed Google Scholar
Tibérius O. Bonates
View author publications
You can also search for this author in PubMed Google Scholar
Chungmok Lee
View author publications
You can also search for this author in PubMed Google Scholar
Wanpracha Art Chaovalitwongse
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chun-An Chou.

Additional information

This research is supported by in part the SUNY Research Foundation Grant (I920247) and the National Science Foundation Grant (CCF-0546574). The second author gratefully acknowledges the partial financial support of CNPq, the Brazilian Council for Scientific and Technological Development.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chou, CA., Bonates, T.O., Lee, C. et al. Multi-pattern generation framework for logical analysis of data. Ann Oper Res 249, 329–349 (2017). https://doi.org/10.1007/s10479-015-1867-8

Download citation

Published: 19 April 2015
Issue Date: February 2017
DOI: https://doi.org/10.1007/s10479-015-1867-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-pattern generation framework for logical analysis of data

Abstract

Access this article

Similar content being viewed by others

Puma optimizer (PO): a novel metaheuristic optimization algorithm and its application in machine learning

A comprehensive survey of data mining

A comprehensive survey on feature selection in the various fields of machine learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-pattern generation framework for logical analysis of data

Abstract

Access this article

Similar content being viewed by others

Puma optimizer (PO): a novel metaheuristic optimization algorithm and its application in machine learning

A comprehensive survey of data mining

A comprehensive survey on feature selection in the various fields of machine learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation