Skip to main content

Advertisement

Log in

Proteomics Versus Clinical Data and Stochastic Local Search Based Feature Selection for Acute Myeloid Leukemia Patients’ Classification

  • Systems-Level Quality Improvement
  • Published:
Journal of Medical Systems Aims and scope Submit manuscript

Abstract

The use of data issued from high throughput technologies in drug target problems is widely widespread during the last decades. This study proposes a meta-heuristic framework using stochastic local search (SLS) combined with random forest (RF) where the aim is to specify the most important genes and proteins leading to the best classification of Acute Myeloid Leukemia (AML) patients. First we use a stochastic local search meta-heuristic as a feature selection technique to select the most significant proteins to be used in the classification task step. Then we apply RF to classify new patients into their corresponding classes. The evaluation technique is to run the RF classifier on the training data to get a model. Then, we apply this model on the test data to find the appropriate class. We use as metrics the balanced accuracy (BAC) and the area under the receiver operating characteristic curve (AUROC) to measure the performance of our model. The proposed method is evaluated on the dataset issued from DREAM 9 challenge. The comparison is done with a pure random forest (without feature selection), and with the two best ranked results of the DREAM 9 challenge. We used three types of data: only clinical data, only proteomics data, and finally clinical and proteomics data combined. The numerical results show that the highest scores are obtained when using clinical data alone, and the lowest is obtained when using proteomics data alone. Further, our method succeeds in finding promising results compared to the methods presented in the DREAM challenge.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. http://dreamchallenges.org/project/dream-9-acute-myeloid-leukemia-aml-outcome-prediction/

  2. These data were provided by Dr. Steven Kornblau from The University of Texas MD Anderson Cancer Center and were obtained through Synapse syn2455683 as part of the AML DREAM Challenge.

  3. http://www.pf-bird.univ-nantes.fr/

References

  1. Boughaci, D.: Metaheuristic approaches for the winner determination problem in combinatorial auction. In: Artificial Intelligence, Evolutionary Computing and Metaheuristics - In the Footsteps of Alan Turing, pp. 775–791, 2013. https://doi.org/10.1007/978-3-642-29694-9_29.

  2. Breiman, L.: Random forests. Mach. Learn. 45(1):5–32, 2001. https://doi.org/10.1023/A:1010933404324

  3. Cerami, E.G., Gross, B.E., Demir, E., Rodchenkov, I., Babur, O., Anwar, N., Schultz, N., Bader, G.D., and Sander, C.: Pathway commons, a web resource for biological pathway data. Nucleic Acids Res. 39(suppl_1):D685–D690, 2011. https://doi.org/10.1093/nar/gkq1039

  4. Croft, D., O’Kelly, G., Wu, G., Haw, R., Gillespie, M., Matthews, L., Caudy, M., Garapati, P., Gopinath, G., Jassal, B., Jupe, S., Kalatskaya, I., Mahajan, S., May, B., Ndegwa, N., Schmidt, E., Shamovsky, V., Yung, C., Birney, E., Hermjakob, H., D’Eustachio, P., and Stein, L.: Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res. 39(suppl_1):D691–D697, 2011. https://doi.org/10.1093/nar/gkq1018

  5. Croft, D., Mundo, A.F., Haw, R., Milacic, M., Weiser, J., Wu, G., Caudy, M., Garapati, P., Gillespie, M., Kamdar, M.R., Jassal, B., Jupe, S., Matthews, L., May, B., Palatnik, S., Rothfels, K., Shamovsky, V., Song, H., Williams, M., Birney, E., Hermjakob, H., Stein, L., and D’Eustachio, P., The reactome pathway knowledgebase. Nucleic Acids Res. 42(Database-Issue):472–477, 2014.

    Article  CAS  Google Scholar 

  6. Dexter, P., Jing, C., David, W., Ricardo, R., Rudolf, P., Vladimir, R., Keiichiro, O., Carol, M., Lyndon, H., Sandor, S., Aleksandar, S., Radu, D., Michael, B., Jan, K., Barry, D., and Trey, I.: Ndex, the network data exchange. Cell Systems 1:302–305, 2015. https://doi.org/10.1016/j.cels.2015.10.001

  7. Dhaenens, C., and Jourdan, L., On the use of metaheuristics for feature selection in classification, pp. 135–145. Hoboken: Wiley, 2016. https://doi.org/10.1002/9781119347569.ch7.

    Google Scholar 

  8. Gawehn, E., Hiss, J.A., and Schneider, G.: Deep learning in drug discovery. Mol. Inf. 35(1):3–14, 2016. https://doi.org/10.1002/minf.201501008

  9. Ghaemi, M., and Feizi Derakhshi, M.R.: Classifying different feature selection algorithms based on the search strategies. In: International Conference on Machine Learning, Electrical and Mechanical Engineering (ICMLEME’2014), At Dubai (UAE), pp. 17–21, 2014.

  10. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I.H.: The weka data mining software: an update. SIGKDD Explor Newsl. 11(1):10–18, 2009. https://doi.org/10.1145/1656274.1656278

  11. Hall, M.A., Correlation-based feature selection for machine learning. Hamilton: Tech. rep. The University of Waikato, 1999.

    Google Scholar 

  12. Hoos, H.H., and Stützle, T.: 2 - {SLS} {METHODS}. In: Hoos, H.H., and Stützle, T. (Eds.) Stochastic Local Search, The Morgan Kaufmann Series in Artificial Intelligence, Morgan Kaufmann, San Francisco, pp. 61–112, 2005. https://doi.org/10.1016/B978-155860872-6/50019-6. https://www.sciencedirect.com/science/article/pii/B9781558608726500196

  13. Hoos, H.H., and Stützle, T., Stochastic Local Search Algorithms: An Overview, pp. 1085–1105. Berlin: Springer, 2015. https://doi.org/10.1007/978-3-662-43505-2_54.

    Google Scholar 

  14. Kanehisa, M., and Goto, S.: Kegg: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28(1):27–30, 2000. https://doi.org/10.1093/nar/28.1.27

  15. Kuhn, M., Yates, P., and Hyde, C., Statistical methods for drug discovery, pp. 53–81. Cham: Springer International Publishing, 2016. https://doi.org/10.1007/978-3-319-23558-5_4.

    Google Scholar 

  16. Liaw, A., and Wiener, M., Classification and regression by randomforest. R News 2(3):18–22, 2002. http://CRAN.R-project.org/doc/Rnews/.

    Google Scholar 

  17. Lima, A.N., Philot, E.A., Trossini, G.H.G., Scott, L.P.B., Maltarollo, V.G., and Honorio, K.M.: Use of machine learning approaches for novel drug discovery. Expert Opin. Drug Discovery. 11(3):225–239, 2016. https://doi.org/10.1517/17460441.2016.1146250. pMID: 26814169

  18. Liu, L., Chang, Y., Yang, T., Noren, D.P., Long, B., Kornblau, S., Qutub, A., and Ye, J.: Evolution-informed modeling improves outcome prediction for cancers. Evol. Appl. 10(1):68–76, 2017. https://doi.org/10.1111/eva.12417

  19. Miannay, B., Minvielle, S., Roux, O., Drouin, P., Avet-Loiseau, H. , Guerin-Charbonnel, C., Gouraud, W., Attal, M., Facon, T., Munshi, N.C., Moreau, P. , Campion, L., Magrangeas, F., and Guziolowski, C.: Logic programming reveals alteration of key transcription factors in multiple myeloma. Sci Rep 7(1):9257, 2017. https://doi.org/10.1038/s41598-017-09378-9

  20. Murphy, R.F.: An active role for machine learning in drug development. Nat Chem Biol 7:327–330, 2011. https://doi.org/10.1038/nchembio.576

  21. Noren, D., Long, B., Norel, R., Rrhissorrakrai, K., Hess, K., Hu, C., Bisberg, A., Schultz, A., Engquist, E., Liu, L., Lin, X., Chen, G., Xie, H., Hunter, G., Boutros, P., Stepanov, O., Norman, T., Friend, S., Stolovitzky, G., Kornblau, S., and Qutub, A.: DREAM 9 AML-OPC Consortium, A crowdsourcing approach to developing and assessing prediction algorithms for aml prognosis. PLoS Comput. Biol. 12(6), 2016. https://doi.org/10.1371/journal.pcbi.1004890

  22. Turei, D., Korcsmaros, T., and Saez-Rodriguez, J.: Omnipath: guidelines and gateway for literature-curated signaling pathway resources. Nat Meth 13:966–967, 2016. https://doi.org/10.1038/nmeth.4077

  23. Yuanyuan, W.M.: Statistical methods for high throughput screening drug discovery data. PhD thesis, 2005, http://hdl.handle.net/10012/1204.

  24. Witten, I.H., Frank, E., Hall, M.A., and Pal, C.J., Data mining fourth edition: Practical machine learning tools and techniques. 4 ed. San Francisco: Morgan Kaufmann Publishers Inc., 2016.

    Google Scholar 

  25. Yusta, S.C.: Different metaheuristic strategies to solve the feature selection problem. Pattern Recogn. Lett. 30(5):525–534, 2009. https://doi.org/10.1016/j.patrec.2008.11.012. http://www.sciencedirect.com/science/article/pii/S0167865508003565

Download references

Acknowledgements

The authors would like to thank DREAM 9 Challenge for providing data in public. Also PROFAS B + as well as the 1 month program stay provided by the university of Tizi Ouzou. We are most grateful to the bioinformatics core facility of Nantes (BiRD - Biogenouest) for its technical support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lokmane Chebouba.

Ethics declarations

Conflict of interests

Author L. CHEBOUBA declares that he has no conflict of interest. Author D. BOUGHACI declares that she has no conflict of interest. Author C. GUZIOLOWSKI declares that she has no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

This article is part of the Topical Collection on Systems-Level Quality Improvement

Electronic supplementary material

Below is the link to the electronic supplementary material.

(PDF 32.7 KB)

(PDF 35.3 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chebouba, L., Boughaci, D. & Guziolowski, C. Proteomics Versus Clinical Data and Stochastic Local Search Based Feature Selection for Acute Myeloid Leukemia Patients’ Classification. J Med Syst 42, 129 (2018). https://doi.org/10.1007/s10916-018-0972-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10916-018-0972-z

Keywords

Navigation