Abstract
Dealing with high-dimensional gene expression data is a challenging issue, and it is crucial to select multiple informative subsets of genes for cancer classification. In this regard, many statistical and machine learning methods with regulations have been developed. However, these methods neglected the epistasis, i.e., some genes may cover or affect other genes. In this article, we propose a fuzzy measure with regularization, which adopts L1 and L1/2 norms for sparse solutions, known as FMR, to describe the interaction between genes. Regularization with L1 and L1/2 can obtain a series of sparse solutions which help solving fuzzy measure quicker than traditional methods, such as Genetic Algorithm. FMR obtains a subset of genes corresponding to the fewest nonzero fuzzy measure values, and consequently, selects the important gene(s) according to the frequency of appearance in the selected gene subsets. Besides, three base classifiers, including SVM, KNN and DBN, are employed as underlying models to verify the effectiveness of the selected subset(s) of genes. Experimental results indicate that the selected genes by FMR are consistent with several clinical studies. In addition, it can produce comparable results in terms of accuracy as compared with other methods reported in the literature. The codes used in this article are freely available at: https://github.com/wangphoenix/ICMLC.
Similar content being viewed by others
References
Gayathri BM, Sumathi CP, Santhanam T (2013) Breast cancer diagnosis using machine learning algorithms-a survey. Int J Distrib Parallel Syst 4(3):105–112
Kharya S (2012) Using data mining techniques for diagnosis and prognosis of cancer disease. Int J ComputSciEngInfTechnol 2(2):55–66
Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis DI (2015) Machine learning applications in cancer prognosis and prediction. ComputStructBiotechnol J 13:8–17
Shajahaan SS, Shanthi S, Manochitra V (2013) Application of data mining techniques to model breast cancer data. Int J EmergTechnolAdvEng 3(11):362–369
Shrivastava SS, Sant A, Aharwal RP (2013) An overview on data mining approach on breast cancer data. Int J AdvComput Res 3(4):256–262
Alonso-González CJ, Moro-Sancho QI, Simon-Hurtado A, Varela-Arrabal R (2012) Microarray gene expression classification with few genes: criteria to com- bine attribute selection and classification methods. Expert SystAppl 39:7270–7280
Cui Y, Zheng CH, Yang J, Sha W (2013) Sparse maximum margin discriminant analysis for feature extraction and gene selection on gene expression data. ComputBiol Med 43:933–941
Kalina J (2014) Classification methods for high-dimensional genetic data. Biocybern Biomed Eng 34:10–18
Piao Y, Piao M, Park K, Ryu KH (2012) An ensemble correlation-based gene selection algorithm for cancer classification with gene expression data. Bioinformatics 28:3306–3315
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537
Cawley GC, Talbot NLC (2006) Gene selection in cancer classification using sparse logistic regression with Bayesian regularization. Bioinformatics 22(19):2348–2355
Zhou LT, Cao YH, Lv LL et al (2017) Feature selection and classification of urinary mRNA microarray data by iterative random forest to diagnose renal fibrosis: a two-stage study. Sci Rep 7:39832
Zhao G, Wu Y (2016) Feature subset selection for cancer classification using weight local modularity. Sci Rep 6:34759. https://doi.org/10.1038/srep34759
Huang HH, Liu XY, Liang Y (2016) Feature selection and cancer classification via sparse logistic regression with the hybrid L1/2+2 regularization. PLoS ONE 11(5):e0149675
Jayasurya L, Krishna Anand S (2016) Feature selection for microarray data using WGCNA based fuzzy forest in map reduce paradigm. Indian J SciTechnol. https://doi.org/10.17485/ijst/2016/v9i48/107971
Algamal ZY, Lee MH (2015) Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification. Expert SystAppl 42(23):9326–9332
Xu Z, Chang X, Xu F et al (2012) L1/2 regularization: a thresholding representation theory and a fast solver. IEEE Trans Neural Netw 23(7):1013–1027
Gao L, Ye M, Lu X et al (2017) Hybrid method based on information gain and support vector machine for gene selection in cancer classification. Genomics Proteomics Bioinform 15(6):389–395
Yang KJ, Cai Z, Li J et al (2006) A stable gene selection in microarray data analysis. BMC Bioinform 7(1):228–228
Liang Y, Liu C, Luan X et al (2013) Sparse logistic regression with a L1/2 penalty for gene selection in cancer classification. BMC Bioinform 14(1):198–198
Yuan M, Yang Z, Ji G et al (2019) Partial maximum correlation information: a new feature selection method for microarray data classification. Neurocomputing 323:231–243. https://doi.org/10.1016/j.neucom.2018.09.084
Li C, Li H (2008) Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics 24(9):1175–1182
Smith V, Forte S, Jordan MI, et al (2015) L1-regularized distributed optimization: a communication-efficient primal-dual framework. https://arxiv.org/pdf/1512.04011v1.pdf
Yuan GX, Ho CH, Lin CJ (2012) An improved GLMNET for L1-regularized logistic regression. J Mach Learn Res 13:1999–2030
Sun Y, Lu C, Li X (2018) The cross-entropy based multi-filter ensemble method for gene selection. Genes 9(5):258
Jain I, Jain VK, Jain R (2018) Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification. Appl Soft Comput 62:203–215
Zou H (2006) The adaptive LASSO and its oracle properties J. Am Stat Assoc (Taylor & Francis) 101:1418–1429
Meinshausen N, Yu B (2009) LASSO-type recovery of sparse representations for high-dimensional data. Ann Stat JSTOR 37:246–270
Wang Z (1985) Asymptotic structural characteristics of fuzzy measure and their applications. Fuzzy Sets Syst 16(3):277–290
Chen R, Guo S, Wang X et al (2019) Fusion of multi-RSMOTE with fuzzy integral to classify bug reports with an imbalanced distribution. IEEE Trans Fuzzy Syst 27:2406–2420
Zhai J, Zhou X, Zhang S et al (2019) Ensemble RBM-based classifier using fuzzy integral for big data classification. Int J Mach Learn Cybern 10:3327–3337
Grabisch M (2003) The symmetric Sugeno integral. Fuzzy Sets Syst 139:473–490
Wang Z, Guo HF (2003) A new genetic algorithm for nonlinear multiregressions based on generalized Fuzzy integrals. IEEE IntConf Fuzzy Syst 2:819–821
Murofushi T, Sugeno M, Machida M (1994) Non monotonic fuzzy measures and the fuzzy integral. Fuzzy Sets Syst 64:73–86
Wang Z (2003) A new genetic algorithm for nonlinear multiregressions based on generalized Choquet integrals. In: Proc. 12th IEEE intern. conf. fuzzy systems, vol 2, pp 819–821
Wang W, Wang ZY, Klir GJ (1998) Genetic algorithm for determining fuzzy measures from data. J Intell Fuzzy Syst 6:171–183
Leung KS, Wong ML, Lam W, Wang Z, Xu K (2002) Learning nonlinear multiregression networks based on evolutionary computation. IEEE Trans Syst Man Cybern Part B 32(5):630–644
Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Caki F (eds) Second international symposium on information theory, Budapest, pp 267–281
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Tibshirani R (1996) Regression shrinkage and selection via the LASSO. J R Stat Soc B 58:267–288
Donoho DL, Huo X (2001) Uncertainty principles and ideal atomic decomposition. IEEE Trans Inf Theory 47:2845–2862
Donoho DL, Elad E (2003) Maximal sparsity representation via l1 minimization. ProcNatlAcalSci 100:2197–2202
Chen S, Donoho DL, Saunders M (2001) Atomic decomposition by basis pursuit. SIAM Rev 43:129–159
Xu ZB, Hai Z, Yao W et al (2010) L1/2 regularization. Sci China InfSci 53(6):1159–1169
Shipp MA, Ross KN, Tamayo P et al (2002) Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8(1):68–74
Singh D, Febbo PG, Ross K et al (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209
Daubechies I, Devore R, Fornasier M (2010) Iteratively reweighted least squares minimization for sparse recovery. Commun Pure Appl Math 63(1):1–38
Alon U, Barkai N, Notterman DA et al (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. ProcNatlAcadSci USA 96(12):6745–6750
Freije WA, Edmundo Castro-Vargas F, Fang Z et al (2004) Gene expression profiling of gliomas strongly predicts survival. Can Res 64(18):6503–6510
Affymetrix (2001) Microarray suite user’s guide version 5.0. Affymetrix Inc., Santa Clara
Bolstad BM, Irizarry RA, Astrand M et al (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2):185–193
Wu Z, Irizarry RA, Gentleman R et al (2004) A model based background adjustment for oligonucleotide expression arrays. J Am Stat Assoc 99(468):909–917
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the fourteenth international joint conference on artificial intelligence, Montreal, QC, Canada, pp 1137–1143
Lin Y, Sharma S, John MS (2014) CCL21 Cancer immunotherapy. Cancers 6:1098–1110
Qu K, Wang Z, Fan H et al (2017) MCM7 promotes cancer progression through cyclin D1-dependent signaling and serves as a prognostic marker for patients with hepatocellular carcinoma. Cell Death Dis 8(2):e2603. https://doi.org/10.1038/cddis.2016.352
Hill R, Madureira PA, Ferreira B et al (2017) TRIB2 confers resistance to anti-cancer therapy by activating the serine/threonine protein kinase AKT. Nat Commun 8:14687
Kohno M, Hasegawa H, Inoue A, Muraoka M, Miyazaki T, Oka K, Yasukawa M (2006) Identification of N-arachidonylglycine as the endogenous ligand for orphan G-protein-coupled receptor GPR18. BiochemBiophys Res Commun 347(3):827–832
Finlay DB, Joseph WR, Grimsey NL, Glass M (2016) GPR18 undergoes a high degree of constitutive trafficking but is unresponsive to N-arachidonoyl glycine. PeerJ. https://doi.org/10.7717/peerj.1835
Zhang L, Qiu C, Yang L et al (2019) GPR18 expression on PMNs as biomarker for outcome in patient with sepsis. Life Sci 217:49–56
Ding WH, Ren KW, Yue C et al (2008) Association between three genetic variants in kallikrein 3 and prostate cancer risk. Biosci Rep. https://doi.org/10.1042/BSR20181151
Wang J, Koo KM, Wang Y et al (2018) ‘Mix-to-Go’ silver colloidal strategy for prostate cancer molecular profiling and risk prediction. Anal Chem 90:12698–12705
Munkley J, McClurg UL, Livermore KE et al (2017) The cancer-associated cell migration protein TSPAN1 is under control of androgens and its upregulation increases prostate cancer cell migration. Sci Rep 7:5249. https://doi.org/10.1038/s41598-017-05489-5
Albitar M, Ma W, Lund L et al (2016) Predicting prostate biopsy results using a panel of plasma and urine biomarkers combined in a scoring system. J Cancer 7(3):297–303
Willbold R, Wirth K, Martini T, Holger S, Wittig R (2019) Excess hepsinproteolytic activity limits oncogenic signaling and induces ER stress and autophagy in prostate cancer cells. Cell Death Dis. https://doi.org/10.1038/s41419-019-1830-8
Qi Y, Li Y, Zhang Y, Zhang L, Wang Z (2015) IFI6 inhibits apoptosis via mitochondrial-dependent pathway in dengue virus 2 infected vascular endothelial cells. PLoS ONE 10(8):e0132743
Blake RR, Ohlson MB, Eitson JL et al (2018) A CRISPR screen identifies IFI6 as an ER-resident interferon effector that blocks flavivirus replication. Nat Microbiol 3:1214–1223
Choi YY, Cho HD, Park DG, Kim SY, Baek MJ (2008) Expression of hypoxia-inducible factor-1α and vascular endothelial growth factor in colon cancer: relationship to the prognosis and tumor markers. Ann Coloproctol 24(5):337
Mia HJ, Qi XG (2010) Role of cxcl8/cxcr1 in the metastasis of human colon cancer. World Chin J Digestol 18(22):2379
Zhao Q, Jiang C, Gao Q, Zhang Y, Wang G, Chen X et al (2020) Gene expression and methylation profiles identified cxcl3 and cxcl8 as key genes for diagnosis and prognosis of colon adenocarcinoma. J Cell Physiol 235:4902–4912
Garrido A, Fromentin A, Bonnotte B, Favre N, Moutet M, Arrigo AP et al (1998) Heat shock protein 27 enhances the tumorigenicity of immunogenic rat colon carcinoma cell clones. Can Res 58(23):5495–5949
Tsuruta, (2008) Heat shock protein 27, a novel regulator of 5-fluorouracil resistance in colon cancer. Oncol Rep 20(5):1165–1172. https://doi.org/10.3892/or_00000125
Donahoe PK, Fuller AF, Scully RE, Guy SR, Budzik GP (1981) Mullerian inhibiting substance inhibits growth of a human ovarian cancer in nude mice. Ann Surg 194(4):472–480
Masahiro S, Hideomi H, Hiroyuki H, Suzuki SO, Masaki T, Tetsuro A et al (2019) Upregulation of Annexin A1 in reactive astrocytes and its subtle induction in microglia at the boundaries of human brain infarcts. J NeuropatholExpNeurol 78(10):961–970. https://doi.org/10.1093/jnen/nlz079
Gao YF, Liu JY, Mao XY et al (2020) LncRNA FOXD1-AS1 acts as a potential oncogenic biomarker in glioma. CNS NeurosciTher 26:66–75. https://doi.org/10.1111/cns.13152
Kitamura K, Sakata J, Kangawa K, Kojima M, Matsuo H, Eto T (1993) Cloning and characterization of cDNA encoding a precursor for human adrenomedullin. BiochemBiophys Res Commun 194(2):720–725
Rodrigues-Pinto R, Ward L, Humphreys M et al (2018) Human notochordal cell transcriptome unveils potential regulators of cell function in the developing intervertebral disc. Sci Rep 8(1):12866. https://doi.org/10.1038/s41598-018-31172-4
Acknowledgements
We would also like to thank Prof. Xi-Zhao Wang in Shenzhen University for his support and revision.
Funding
Funding was provided by the Science and Technology Planning Project of Guangdong Province of China (Grant no. 2017A040406023).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations
Rights and permissions
About this article
Cite this article
Wang, J., He, Z., Huang, S. et al. Fuzzy measure with regularization for gene selection and cancer prediction. Int. J. Mach. Learn. & Cyber. 12, 2389–2405 (2021). https://doi.org/10.1007/s13042-021-01319-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-021-01319-3