Skip to main content

Advertisement

Log in

Fuzzy measure with regularization for gene selection and cancer prediction

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Dealing with high-dimensional gene expression data is a challenging issue, and it is crucial to select multiple informative subsets of genes for cancer classification. In this regard, many statistical and machine learning methods with regulations have been developed. However, these methods neglected the epistasis, i.e., some genes may cover or affect other genes. In this article, we propose a fuzzy measure with regularization, which adopts L1 and L1/2 norms for sparse solutions, known as FMR, to describe the interaction between genes. Regularization with L1 and L1/2 can obtain a series of sparse solutions which help solving fuzzy measure quicker than traditional methods, such as Genetic Algorithm. FMR obtains a subset of genes corresponding to the fewest nonzero fuzzy measure values, and consequently, selects the important gene(s) according to the frequency of appearance in the selected gene subsets. Besides, three base classifiers, including SVM, KNN and DBN, are employed as underlying models to verify the effectiveness of the selected subset(s) of genes. Experimental results indicate that the selected genes by FMR are consistent with several clinical studies. In addition, it can produce comparable results in terms of accuracy as compared with other methods reported in the literature. The codes used in this article are freely available at: https://github.com/wangphoenix/ICMLC.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. http://genomics-pubs.princeton.edu/oncology.

References

  1. Gayathri BM, Sumathi CP, Santhanam T (2013) Breast cancer diagnosis using machine learning algorithms-a survey. Int J Distrib Parallel Syst 4(3):105–112

    Article  Google Scholar 

  2. Kharya S (2012) Using data mining techniques for diagnosis and prognosis of cancer disease. Int J ComputSciEngInfTechnol 2(2):55–66

    Google Scholar 

  3. Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis DI (2015) Machine learning applications in cancer prognosis and prediction. ComputStructBiotechnol J 13:8–17

    Google Scholar 

  4. Shajahaan SS, Shanthi S, Manochitra V (2013) Application of data mining techniques to model breast cancer data. Int J EmergTechnolAdvEng 3(11):362–369

    Google Scholar 

  5. Shrivastava SS, Sant A, Aharwal RP (2013) An overview on data mining approach on breast cancer data. Int J AdvComput Res 3(4):256–262

    Google Scholar 

  6. Alonso-González CJ, Moro-Sancho QI, Simon-Hurtado A, Varela-Arrabal R (2012) Microarray gene expression classification with few genes: criteria to com- bine attribute selection and classification methods. Expert SystAppl 39:7270–7280

    Article  Google Scholar 

  7. Cui Y, Zheng CH, Yang J, Sha W (2013) Sparse maximum margin discriminant analysis for feature extraction and gene selection on gene expression data. ComputBiol Med 43:933–941

    Article  Google Scholar 

  8. Kalina J (2014) Classification methods for high-dimensional genetic data. Biocybern Biomed Eng 34:10–18

    Article  Google Scholar 

  9. Piao Y, Piao M, Park K, Ryu KH (2012) An ensemble correlation-based gene selection algorithm for cancer classification with gene expression data. Bioinformatics 28:3306–3315

    Article  Google Scholar 

  10. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537

    Article  Google Scholar 

  11. Cawley GC, Talbot NLC (2006) Gene selection in cancer classification using sparse logistic regression with Bayesian regularization. Bioinformatics 22(19):2348–2355

    Article  Google Scholar 

  12. Zhou LT, Cao YH, Lv LL et al (2017) Feature selection and classification of urinary mRNA microarray data by iterative random forest to diagnose renal fibrosis: a two-stage study. Sci Rep 7:39832

    Article  Google Scholar 

  13. Zhao G, Wu Y (2016) Feature subset selection for cancer classification using weight local modularity. Sci Rep 6:34759. https://doi.org/10.1038/srep34759

    Article  Google Scholar 

  14. Huang HH, Liu XY, Liang Y (2016) Feature selection and cancer classification via sparse logistic regression with the hybrid L1/2+2 regularization. PLoS ONE 11(5):e0149675

    Article  Google Scholar 

  15. Jayasurya L, Krishna Anand S (2016) Feature selection for microarray data using WGCNA based fuzzy forest in map reduce paradigm. Indian J SciTechnol. https://doi.org/10.17485/ijst/2016/v9i48/107971

    Article  Google Scholar 

  16. Algamal ZY, Lee MH (2015) Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification. Expert SystAppl 42(23):9326–9332

    Article  Google Scholar 

  17. Xu Z, Chang X, Xu F et al (2012) L1/2 regularization: a thresholding representation theory and a fast solver. IEEE Trans Neural Netw 23(7):1013–1027

    Article  Google Scholar 

  18. Gao L, Ye M, Lu X et al (2017) Hybrid method based on information gain and support vector machine for gene selection in cancer classification. Genomics Proteomics Bioinform 15(6):389–395

    Article  Google Scholar 

  19. Yang KJ, Cai Z, Li J et al (2006) A stable gene selection in microarray data analysis. BMC Bioinform 7(1):228–228

    Article  Google Scholar 

  20. Liang Y, Liu C, Luan X et al (2013) Sparse logistic regression with a L1/2 penalty for gene selection in cancer classification. BMC Bioinform 14(1):198–198

    Article  Google Scholar 

  21. Yuan M, Yang Z, Ji G et al (2019) Partial maximum correlation information: a new feature selection method for microarray data classification. Neurocomputing 323:231–243. https://doi.org/10.1016/j.neucom.2018.09.084

    Article  Google Scholar 

  22. Li C, Li H (2008) Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics 24(9):1175–1182

    Article  Google Scholar 

  23. Smith V, Forte S, Jordan MI, et al (2015) L1-regularized distributed optimization: a communication-efficient primal-dual framework. https://arxiv.org/pdf/1512.04011v1.pdf

  24. Yuan GX, Ho CH, Lin CJ (2012) An improved GLMNET for L1-regularized logistic regression. J Mach Learn Res 13:1999–2030

    MathSciNet  MATH  Google Scholar 

  25. Sun Y, Lu C, Li X (2018) The cross-entropy based multi-filter ensemble method for gene selection. Genes 9(5):258

    Article  Google Scholar 

  26. Jain I, Jain VK, Jain R (2018) Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification. Appl Soft Comput 62:203–215

    Article  Google Scholar 

  27. Zou H (2006) The adaptive LASSO and its oracle properties J. Am Stat Assoc (Taylor & Francis) 101:1418–1429

    Article  MathSciNet  MATH  Google Scholar 

  28. Meinshausen N, Yu B (2009) LASSO-type recovery of sparse representations for high-dimensional data. Ann Stat JSTOR 37:246–270

    MathSciNet  MATH  Google Scholar 

  29. Wang Z (1985) Asymptotic structural characteristics of fuzzy measure and their applications. Fuzzy Sets Syst 16(3):277–290

    Article  MathSciNet  MATH  Google Scholar 

  30. Chen R, Guo S, Wang X et al (2019) Fusion of multi-RSMOTE with fuzzy integral to classify bug reports with an imbalanced distribution. IEEE Trans Fuzzy Syst 27:2406–2420

    Article  Google Scholar 

  31. Zhai J, Zhou X, Zhang S et al (2019) Ensemble RBM-based classifier using fuzzy integral for big data classification. Int J Mach Learn Cybern 10:3327–3337

    Article  Google Scholar 

  32. Grabisch M (2003) The symmetric Sugeno integral. Fuzzy Sets Syst 139:473–490

    Article  MathSciNet  MATH  Google Scholar 

  33. Wang Z, Guo HF (2003) A new genetic algorithm for nonlinear multiregressions based on generalized Fuzzy integrals. IEEE IntConf Fuzzy Syst 2:819–821

    Google Scholar 

  34. Murofushi T, Sugeno M, Machida M (1994) Non monotonic fuzzy measures and the fuzzy integral. Fuzzy Sets Syst 64:73–86

    Article  MATH  Google Scholar 

  35. Wang Z (2003) A new genetic algorithm for nonlinear multiregressions based on generalized Choquet integrals. In: Proc. 12th IEEE intern. conf. fuzzy systems, vol 2, pp 819–821

  36. Wang W, Wang ZY, Klir GJ (1998) Genetic algorithm for determining fuzzy measures from data. J Intell Fuzzy Syst 6:171–183

    Google Scholar 

  37. Leung KS, Wong ML, Lam W, Wang Z, Xu K (2002) Learning nonlinear multiregression networks based on evolutionary computation. IEEE Trans Syst Man Cybern Part B 32(5):630–644

    Article  Google Scholar 

  38. Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Caki F (eds) Second international symposium on information theory, Budapest, pp 267–281

  39. Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    Article  MathSciNet  MATH  Google Scholar 

  40. Tibshirani R (1996) Regression shrinkage and selection via the LASSO. J R Stat Soc B 58:267–288

    MathSciNet  MATH  Google Scholar 

  41. Donoho DL, Huo X (2001) Uncertainty principles and ideal atomic decomposition. IEEE Trans Inf Theory 47:2845–2862

    Article  MathSciNet  MATH  Google Scholar 

  42. Donoho DL, Elad E (2003) Maximal sparsity representation via l1 minimization. ProcNatlAcalSci 100:2197–2202

    Article  MATH  Google Scholar 

  43. Chen S, Donoho DL, Saunders M (2001) Atomic decomposition by basis pursuit. SIAM Rev 43:129–159

    Article  MathSciNet  MATH  Google Scholar 

  44. Xu ZB, Hai Z, Yao W et al (2010) L1/2 regularization. Sci China InfSci 53(6):1159–1169

    Article  Google Scholar 

  45. Shipp MA, Ross KN, Tamayo P et al (2002) Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8(1):68–74

    Article  Google Scholar 

  46. Singh D, Febbo PG, Ross K et al (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209

    Article  Google Scholar 

  47. Daubechies I, Devore R, Fornasier M (2010) Iteratively reweighted least squares minimization for sparse recovery. Commun Pure Appl Math 63(1):1–38

    Article  MathSciNet  MATH  Google Scholar 

  48. Alon U, Barkai N, Notterman DA et al (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. ProcNatlAcadSci USA 96(12):6745–6750

    Article  Google Scholar 

  49. Freije WA, Edmundo Castro-Vargas F, Fang Z et al (2004) Gene expression profiling of gliomas strongly predicts survival. Can Res 64(18):6503–6510

    Article  Google Scholar 

  50. Affymetrix (2001) Microarray suite user’s guide version 5.0. Affymetrix Inc., Santa Clara

    Google Scholar 

  51. Bolstad BM, Irizarry RA, Astrand M et al (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2):185–193

    Article  Google Scholar 

  52. Wu Z, Irizarry RA, Gentleman R et al (2004) A model based background adjustment for oligonucleotide expression arrays. J Am Stat Assoc 99(468):909–917

    Article  MathSciNet  MATH  Google Scholar 

  53. Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the fourteenth international joint conference on artificial intelligence, Montreal, QC, Canada, pp 1137–1143

  54. Lin Y, Sharma S, John MS (2014) CCL21 Cancer immunotherapy. Cancers 6:1098–1110

    Article  Google Scholar 

  55. Qu K, Wang Z, Fan H et al (2017) MCM7 promotes cancer progression through cyclin D1-dependent signaling and serves as a prognostic marker for patients with hepatocellular carcinoma. Cell Death Dis 8(2):e2603. https://doi.org/10.1038/cddis.2016.352

    Article  Google Scholar 

  56. Hill R, Madureira PA, Ferreira B et al (2017) TRIB2 confers resistance to anti-cancer therapy by activating the serine/threonine protein kinase AKT. Nat Commun 8:14687

    Article  Google Scholar 

  57. Kohno M, Hasegawa H, Inoue A, Muraoka M, Miyazaki T, Oka K, Yasukawa M (2006) Identification of N-arachidonylglycine as the endogenous ligand for orphan G-protein-coupled receptor GPR18. BiochemBiophys Res Commun 347(3):827–832

    Article  Google Scholar 

  58. Finlay DB, Joseph WR, Grimsey NL, Glass M (2016) GPR18 undergoes a high degree of constitutive trafficking but is unresponsive to N-arachidonoyl glycine. PeerJ. https://doi.org/10.7717/peerj.1835

    Article  Google Scholar 

  59. Zhang L, Qiu C, Yang L et al (2019) GPR18 expression on PMNs as biomarker for outcome in patient with sepsis. Life Sci 217:49–56

    Article  Google Scholar 

  60. Ding WH, Ren KW, Yue C et al (2008) Association between three genetic variants in kallikrein 3 and prostate cancer risk. Biosci Rep. https://doi.org/10.1042/BSR20181151

  61. Wang J, Koo KM, Wang Y et al (2018) ‘Mix-to-Go’ silver colloidal strategy for prostate cancer molecular profiling and risk prediction. Anal Chem 90:12698–12705

    Article  Google Scholar 

  62. Munkley J, McClurg UL, Livermore KE et al (2017) The cancer-associated cell migration protein TSPAN1 is under control of androgens and its upregulation increases prostate cancer cell migration. Sci Rep 7:5249. https://doi.org/10.1038/s41598-017-05489-5

    Article  Google Scholar 

  63. Albitar M, Ma W, Lund L et al (2016) Predicting prostate biopsy results using a panel of plasma and urine biomarkers combined in a scoring system. J Cancer 7(3):297–303

    Article  Google Scholar 

  64. Willbold R, Wirth K, Martini T, Holger S, Wittig R (2019) Excess hepsinproteolytic activity limits oncogenic signaling and induces ER stress and autophagy in prostate cancer cells. Cell Death Dis. https://doi.org/10.1038/s41419-019-1830-8

    Article  Google Scholar 

  65. Qi Y, Li Y, Zhang Y, Zhang L, Wang Z (2015) IFI6 inhibits apoptosis via mitochondrial-dependent pathway in dengue virus 2 infected vascular endothelial cells. PLoS ONE 10(8):e0132743

    Article  Google Scholar 

  66. Blake RR, Ohlson MB, Eitson JL et al (2018) A CRISPR screen identifies IFI6 as an ER-resident interferon effector that blocks flavivirus replication. Nat Microbiol 3:1214–1223

    Article  Google Scholar 

  67. Choi YY, Cho HD, Park DG, Kim SY, Baek MJ (2008) Expression of hypoxia-inducible factor-1α and vascular endothelial growth factor in colon cancer: relationship to the prognosis and tumor markers. Ann Coloproctol 24(5):337

    Google Scholar 

  68. Mia HJ, Qi XG (2010) Role of cxcl8/cxcr1 in the metastasis of human colon cancer. World Chin J Digestol 18(22):2379

    Article  Google Scholar 

  69. Zhao Q, Jiang C, Gao Q, Zhang Y, Wang G, Chen X et al (2020) Gene expression and methylation profiles identified cxcl3 and cxcl8 as key genes for diagnosis and prognosis of colon adenocarcinoma. J Cell Physiol 235:4902–4912

    Article  Google Scholar 

  70. Garrido A, Fromentin A, Bonnotte B, Favre N, Moutet M, Arrigo AP et al (1998) Heat shock protein 27 enhances the tumorigenicity of immunogenic rat colon carcinoma cell clones. Can Res 58(23):5495–5949

    Google Scholar 

  71. Tsuruta, (2008) Heat shock protein 27, a novel regulator of 5-fluorouracil resistance in colon cancer. Oncol Rep 20(5):1165–1172. https://doi.org/10.3892/or_00000125

    Article  Google Scholar 

  72. Donahoe PK, Fuller AF, Scully RE, Guy SR, Budzik GP (1981) Mullerian inhibiting substance inhibits growth of a human ovarian cancer in nude mice. Ann Surg 194(4):472–480

    Article  Google Scholar 

  73. Masahiro S, Hideomi H, Hiroyuki H, Suzuki SO, Masaki T, Tetsuro A et al (2019) Upregulation of Annexin A1 in reactive astrocytes and its subtle induction in microglia at the boundaries of human brain infarcts. J NeuropatholExpNeurol 78(10):961–970. https://doi.org/10.1093/jnen/nlz079

    Article  Google Scholar 

  74. Gao YF, Liu JY, Mao XY et al (2020) LncRNA FOXD1-AS1 acts as a potential oncogenic biomarker in glioma. CNS NeurosciTher 26:66–75. https://doi.org/10.1111/cns.13152

    Article  Google Scholar 

  75. Kitamura K, Sakata J, Kangawa K, Kojima M, Matsuo H, Eto T (1993) Cloning and characterization of cDNA encoding a precursor for human adrenomedullin. BiochemBiophys Res Commun 194(2):720–725

    Article  Google Scholar 

  76. Rodrigues-Pinto R, Ward L, Humphreys M et al (2018) Human notochordal cell transcriptome unveils potential regulators of cell function in the developing intervertebral disc. Sci Rep 8(1):12866. https://doi.org/10.1038/s41598-018-31172-4

    Article  Google Scholar 

Download references

Acknowledgements

We would also like to thank Prof. Xi-Zhao Wang in Shenzhen University for his support and revision.

Funding

Funding was provided by the Science and Technology Planning Project of Guangdong Province of China (Grant no. 2017A040406023).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to JinFeng Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, J., He, Z., Huang, S. et al. Fuzzy measure with regularization for gene selection and cancer prediction. Int. J. Mach. Learn. & Cyber. 12, 2389–2405 (2021). https://doi.org/10.1007/s13042-021-01319-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-021-01319-3

Keywords

Navigation