Skip to main content
Log in

Pattern-based feature selection in genomics and proteomics

  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

A major difficulty in bioinformatics is due to the size of the datasets, which contain frequently large numbers of variables. In this study, we present a two-step procedure for feature selection. In a first “filtering” stage, a relatively small subset of features is identified on the basis of several criteria. In the second stage, the importance of the selected variables is evaluated based on the frequency of their participation in relevant patterns and low impact variables are eliminated. This step is applied iteratively, until arriving to a Pareto-optimal “support set”, which balances the conflicting criteria of simplicity and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Alexe, G., S. Alexe, D.E. Axelrod, T.O. Bonates, I.I. Lozina, M. Reiss, and P.L. Hammer. (2006). “Breast Cancer Prognosis by Combinatorial Analysis of Gene Expression Data.” Breast Cancer Research, 8(4),R41.

  • Alexe, G., S. Alexe, and P.L. Hammer. (2006). “Pattern-based Clustering and Attribute Analysis.” Soft Computing,10(5), 442–452.

    Google Scholar 

  • Alexe, G., S. Alexe, P.L. Hammer, L. Liotta, E. Petricoin, and M. Reiss. (2004). “Ovarian Cancer Detection by Logical Analysis of Proteomic Data.” Proteomics, 3, 766–783.

    Article  Google Scholar 

  • Alexe S., E. Blackstone, P.L. Hammer, H. Ishwaran, M.S. Lauer, and C.E. Pothier Snader. (2003). “Coronary Risk Prediction by Logical Analysis of Data.” Annals of Operations Research, 119, 15–42.

    Article  Google Scholar 

  • Alexe, S. and P.L. Hammer. (2006). “Accelerated Algorithm for Pattern Detection in Logical Analysis of Data.” Discrete Applied Mathematics, 154(7), 1050–1063.

    Google Scholar 

  • Boros, E., P.L. Hammer, T. Ibaraki, and A. Kogan. (1997). “Logical Analysis of Numerical Data.” Mathematical Programming, 79, 163–190.

    Google Scholar 

  • Boros, E., P.L. Hammer, T. Ibaraki, A. Kogan, E. Mayoraz, and I. Muchnik. (2000). “An Implementation of Logical Analysis of Data.” IEEE Transactions on Knowledge and Data Engineering, 12(2), 292–306.

    Article  Google Scholar 

  • Bradley, P.S. and O.L. Mangasarian. (1998). “Feature Selection Via Concave Minimization and Support Vector Machines.” In J. Shavlik, (ed.), Proceedings of the Fifteenth International Conference on Machine Learning, Morgan Kaufmann, San Francisco, CA, pp. 82–90.

  • Chtioui, Y., D. Bertrand, and D. Barba. (1998), “Feature Selection by a Genetic Algorithm.” Application to seed Discrimination by Artificial Vision. Journal of the Science of Food and Agriculture, 76(1), 77–86.

    Article  Google Scholar 

  • Crama, Y., P.L. Hammer, and T. Ibaraki. (1988). “Cause-Effect Relationships and Partially Defined Boolean Functions.” Annals of Operations Research, 16, 299–326.

    Article  Google Scholar 

  • Dash, M. and H. Liu. (1997). “Feature Selection for Classification.” Intelligent Data Analysis, 1(3), 131–156.

    Article  Google Scholar 

  • Golub, T.R., D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander. (1999). “Molecular Classification of Cancer; Class Discovery and Class Prediction by Gene Expression Monitoring.” Science, 286(5439), 531–537.

    Article  Google Scholar 

  • Koda, Y. and F.A. Ruskey. (1993). “Gray Code for the Ideals of a Forest Poset.” Journal of Algorithms, 15, 324–340.

    Article  Google Scholar 

  • Leray, P. and P. Gallinari. (1999). “Feature Selection with Neural Networks.” Behaviormetrika, 26(1).

  • Liu, H. and H. Motoda. (1998a). Feature Extraction, Construction and Selection: A Data Mining Perspective. Kluwer Academic Publishers.

  • Liu, H. and H. Motoda. (1998b). Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers.

  • Petricoin, E.F., A.M. Ardekani, B.A. Hitt, P.J. Levine, V.A. Fusaro, S.M. Steinberg, G.B. Mills, C. Simone, D.A. Fishman, E.C. Kohn, and L.A. Liotta. (2002). “Use of Proteomic Patterns in Serum to Identify Ovarian Cancer.” The Lancet, 359(9306), 572–577.

    Article  Google Scholar 

  • Setiono, R. and H. Liu. (1997). “Neural Network Feature Selector.” IEEE Transactions on Neural Networks, 8(3), 654–662.

    Article  Google Scholar 

  • Shipp M.A., K.N. Ross, P. Tamayo, A.P. Weng, J.L. Kutok, R.C.T. Aguiar, M. Gaasenbeek, M. Angelo, M. Reich, G.S. Pinkus, T.S. Ray, M. Koval, A.K.W. Last, A. Norton, T.A. Lister, J. Mesirov, D.S. Neuberg, E.S. Lander, J.C. Aster, and T.R. Golub. (2002). “Diffuse Large B-Cell Lymphoma Outcome Prediction by Gene Expression Profiling and Supervised Machine Learning.” Nature Medicine, 1(8), 68–74.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter L. Hammer.

Additional information

All authors contributed equally to this manuscript.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alexe, G., Alexe, S., Hammer, P.L. et al. Pattern-based feature selection in genomics and proteomics. Ann Oper Res 148, 189–201 (2006). https://doi.org/10.1007/s10479-006-0084-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-006-0084-x

Keywords

Navigation