Skip to main content

Advertisement

Log in

A mixed integer programming-based global optimization framework for analyzing gene expression data

  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

Abstract

The analysis of high throughput gene expression patients/controls experiments is based on the determination of differentially expressed genes according to standard statistical tests. A typical bioinformatics approach to this problem is composed of two separate steps: first, a subset of genes with altered expression level is identified; then the pathways which are statistically enriched by those genes are selected, assuming they play a relevant role for the biological condition under study. Often, the set of selected pathways contains elements that are not related to the condition. This is due to the fact that the statistical significance is not sufficient for biological relevance. To overcome these problems, we propose a method based on a large mixed integer program that implements a new feature selection model to simultaneously identify the genes whose over- and under-expressions, combined together, discriminate different cancer subtypes, as well as the pathways that are enriched by these genes. The innovation in this model is the solutions are driven towards the enrichment of pathways. That may indeed introduce a bias in the search; such a bias is counter-balanced by a wide exploration of the solution space, varying the involved parameters in their feasible region, and then using a global optimization approach. The conjoint analysis of the pool of solutions obtained by this exploration should indeed provide a robust final set of genes and pathways, overcoming the potential drawbacks of relying solely on statistical significance. Experimental results on transcriptomes for different types of cancer from the Cancer Genome Atlas are presented. The method is able to identify crisp relations between the considered subtypes of cancer and few selected pathways, eventually validated by the biological analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Huang, D.W., et al.: Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57 (2009)

    Article  Google Scholar 

  2. Zhang, B., Shi, Z., Duncan, D.T., Prodduturi, N., Marnett, L.J., Liebler, D.C.: Relating protein adduction to gene expression changes: a systems approach. Mol. BioSyst. 7(7), 2118–27 (2011)

    Article  Google Scholar 

  3. Chen, T.W., Gan, R.C.R., Wu, T.H., Huang, P.J., Lee, C.Y., Chen, Y.Y.M., Chen, C.C., Tang, P.: FastAnnotator: an efficient transcript annotation web tool. BMC Genom. 13(7), S9 (2012)

    Google Scholar 

  4. Tripathi, K.P., Evangelista, D., Zuccaro, A., Guarracino, M.R.: Transcriptator: an automated computational pipeline to annotate assembled reads and identify non coding rna. PLoS One 10(11), e0140268 (2015)

    Article  Google Scholar 

  5. Guarracino, M.R., Cuciniello, S., Pardalos, P.M.: Classification and characterization of gene expression data with generalized eigenvalues. J. Optim. Theory Appl. 141(3), 533–545 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  6. Fay, D.S., Gerow, K.A.: Biologist’s guide to statistical thinking and analysis. In: WormBook (ed.) The C. elegans Research Community, WormBook (2013). doi:10.1895/wormbook.1.159.1

  7. Martnez-Abran, A.: Statistical significance and biological relevance: a call for a more cautious interpretation of results in ecology. Acta Oecol. doi:10.1016/j.actao.2008.02.004

  8. Lovell, D.P.: Biological importance and statistical significance. J. Agric. Food Chem. 61(35), 8340–8348 (2013). doi:10.1021/jf401124y

    Article  Google Scholar 

  9. European Food Safety Authority: Statistical significance and biological relevance. EFSA J. 9(9), 2372 (2011). doi:10.2903/j.efsa.2011.2372

  10. Huang, D.W., Sherman, B.T., Lempicki, R.A.: Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37(1), 1–13 (2009). doi:10.1093/nar/gkn923

    Article  Google Scholar 

  11. Subramanian, A., Tamayoa, P., Moothaa, V.K., Mukherjee, S., Eberta, B.L., Gillettea, M.A., Paulovichg, A., Pomeroyh, S.L., Goluba, T.R., Landera, E.S., Mesirova, J.P.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. PNAS 102(43), 15545–15550 (2005)

    Article  Google Scholar 

  12. Holland, P.W.: Statistics and causal inference. J. Am. Stat. Assoc. 81(396), 945–960 (1986). doi:10.1080/01621459.1986.10478354

    Article  MATH  MathSciNet  Google Scholar 

  13. Guyon, I.: An introduction to variable and feature selection. J. Mach. Learn. Res. Arch. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  14. Pearl, J.: Causality: models, reasoning and inference. Econ. Theory 19, 675–685 (2003)

    Article  Google Scholar 

  15. Sun, M., Xiong, M.: A mathematical programming approach for gene selection and tissue classification. Bioinformatics 19(10), 1243–1251 (2003)

    Article  Google Scholar 

  16. http://cancergenome.nih.gov/

  17. Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., Smyth, G.K.: Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. (2015). doi:10.1093/nar/gkv007

  18. IBM ILOG CPLEX - High-performance mathematical programming engine. http://www.ibm.com/software/integration/optimization/cplex

  19. Maldonado, S., Perez, J., Weber, R., Labb, M.: Feature selection for support vector machines via mixed integer linear programming. Inf. Sci. 279, 163–175 (2014)

    Article  MATH  MathSciNet  Google Scholar 

  20. Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, Dordrecht (2000)

    MATH  Google Scholar 

  21. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  22. Felici, G., de Angelis, V., Mancinelli, G.: Feature selection for data mining. In: Felici, G., Trintaphyllou, E. (eds.) Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques. Springer, Berlin (2006)

    Google Scholar 

  23. Mosca, Ettore, Milanesi, Luciano: Network-based analysis of omics with multi-objective optimization. Mol. BioSyst. 9(12), 2971–2980 (2013)

    Article  Google Scholar 

  24. Felici, G., Bertolazzi, P., Guarracino, M., Chinchuluun, A., Pardalos, P.: Logic formulas based knowledge discovery and its application to the classification of biological data. In: Mondaini, R.P. (ed.) BIOMAT 2008, 2009. World Scientific, Singapore, pp. 265-279. ISBN: 978-981-4271-81-3

  25. Bertolazzi, P., Felici, G., Weitschek, E.: Learning to classify species with barcodes. BMC Bioinf. 10, 1–12 (2009)

    Article  Google Scholar 

  26. Bertolazzi, P., Felici, G., Festa, P., Fiscon, G., Weitschek, E.: Integer programming models for feature selection: new extensions and a randomized solution algorithm. Eur. J. Oper. Res. 250, 389–399 (2016)

    Article  MATH  MathSciNet  Google Scholar 

  27. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman W.H, New York (1979)

    MATH  Google Scholar 

  28. Bertolazzi, P., Felici, G., Lancia, G.: Biological data mining. In: Chen, J.K., Lonardi, S. (eds.) Application of Feature Selection and Classification to Computational Molecular Biology, pp. 257–294. Chapman & Hall, London (2010)

    Google Scholar 

  29. Boros, E., Ibaraki, T., Makino, K.: Logical analysis of binary data with missing bits. Artif. Intell. 107, 219–263 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  30. Fiscon, G., Weitschek, E., Cella, E., Lo Presti, A., Giovanetti, M., Babakir-Mina, M., Ciotti, M., Ciccozzi, M., Pierangeli, A., Bertolazzi, P., Felici, G.: MISSEL: a method to identify a large number of small species-specific genomic subsequences and its application to viruses classification. BioData Min. (2016) (to appear)

  31. Berretta, R., Mendes, A., Moscato, P.: Integer programming models and algorithms for molecular classification of cancer from microarray data. In: ACSC ’05 Proceedings of the Twenty-eighth Australasian conference on Computer Science, vol 38, pp. 361–370 (2005)

  32. Drukker, C.A., et al.: A prospective evaluation of a breast cancer prognosis signature in the observational RASTER study. Int. J. Cancer 133(4), 929–36 (2013)

    Article  Google Scholar 

  33. Li, D., Xia, H., Li, Z., Hua, L., Li, L.: Identification of novel breast cancer subtype-specific biomarkers by integrating genomics analysis of DNA copy number aberrations and miRNA-mRNA dual expression profiling. BioMed Res. Int. 2015 (2015). doi:10.1155/2015/746970

  34. Goldman, M., Craft, B., Swatloski, T., Ellrott, K., Cline, M., Diekhans, M., Ma, S., Wilks, C., Stuart, J., Haussler, D., Zhu, J.: The UCSC Cancer Genomics Browser: update 2013. Nucleic Acids Res. 41(Database Issue), 949–954 (2012). doi:10.1093/nar/gks1008

    Google Scholar 

  35. Tian, F., Wang, Y., Seiler, M., Hu, Z.: Functional characterization of breast cancer using pathway profiles. BMC Med. Genom. 7(1), 45 (2014). doi:10.1186/1755-8794-7-45

  36. Gautier, L., Cope, L., Bolstad, B.M., Irizarry, R.A.: Affy analysis of Affymetrix GeneChip data at the probe level. Bioinformatics 20(3), 1367–4803 (2004). doi:10.1093/bioinformatics/btg405

    Article  Google Scholar 

  37. Student: The probable error of a mean. Biometrika, 6(1), 1–25 (1908). doi:10.1093/biomet/6.1.1

  38. Jiang, P., Du, W., Wu, M.: Regulation of the pentose phosphate pathway in cancer. Protein Cell 5(8), 592–602 (2014)

    Article  Google Scholar 

  39. Hoppertona, K.E., Duncana, R.E., Bazineta, R.P., Archera, M.C.: Fatty acid synthase plays a role in cancer metabolism beyond providing fatty acids for phospholipid synthesis or sustaining elevations in glycolytic activity. Exp. Cell Res. 320(2), 302–310 (2014)

    Article  Google Scholar 

  40. Argiles, J., Costelli, P., Carbo, N., LopezSoriano, F.: Branched-chain amino acid catabolism and cancer cachexia (review). Oncol. Rep. (1996). doi:10.3892/or.3.4.687

    Google Scholar 

  41. Birk, J.U., Lone, S., Susanne, T., Britta, H., Anja, N., Inge, B., Mef, N.: Mismatch repair defective breast cancer in the hereditary nonpolyposis colorectal cancer syndrome. Breast Cancer Res. Treat. 120(3), 777–782 (2010)

    Article  Google Scholar 

  42. Abdel-Fatah, Tarek M.A., Perry, C., Arora, A., Thompson, N., Doherty, R., Moseley, P.M., Green, A.R., Chan, S.Y.T., Ellis, I.O., Madhusudan, S.: Is there a role for base excision repair in estrogen/estrogen receptor-driven breast cancers. Antioxid. Redox Signal. 21(16), 2262–2268 (2014). doi:10.1089/ars.2014.6077

    Article  Google Scholar 

  43. So, E.Y., Ouchi, T.: The application of Toll like receptors for cancer therapy. Int. J. Biol. Sci. 6(7), 675–681 (2010). doi:10.7150/ijbs.6.675

    Article  Google Scholar 

  44. Patt, D.A., Duan, Z., Fang, S., Hortobagyi, G.N., Giordano, S.H.: Acute myeloid leukemia after adjuvant breast cancer. J. Clin. Oncol. 25, 3871–3876 (2007)

    Article  Google Scholar 

  45. Nielsen, T.O., Parker, J.S., Leung, S., et al.: A comparison of PAM50 intrinsic subtyping with immunohistochemistry and clinical prognostic factors in tamoxifen-treated estrogen receptor-positive breast cancer. Clin. Cancer Res. 16(21), 5222–5232 (2010)

    Article  Google Scholar 

  46. Uchida, N., Suda, T., Ishiguro, K.: Effect of chemotherapy for luminal a breast cancer. Yonago Acta Med. 56(2), 51–56 (2013)

    Google Scholar 

  47. Prat, A., et al.: Molecular characterization of basal-like and non-basal-like triple-negative breast cancer. Oncologist 18(2), 123–133 (2013)

    Article  Google Scholar 

  48. Ossovskaya, V., et al.: Exploring Molecular Pathways of Triple-Negative Breast Cancer. Genes Cancer 2(9), 870–879 (2011)

    Article  Google Scholar 

Download references

Acknowledgements

This work was funded by: the INTEROMICS Italian flagship project, PON02-00612-3461281 and PON02-00619-3470457; The SysBioNet project, a MIUR initiative from the Italian Roadmap Research Infrastructures 2012; Mario R. Guarracino work has been conducted at National Research University Higher School of Economics and supported by a RSF Grant 14-41-00039.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kumar Parijat Tripathi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Felici, G., Tripathi, K.P., Evangelista, D. et al. A mixed integer programming-based global optimization framework for analyzing gene expression data. J Glob Optim 69, 727–744 (2017). https://doi.org/10.1007/s10898-017-0530-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10898-017-0530-0

Keywords

Navigation