Fuzzy measure with regularization for gene selection and cancer prediction

Wang, JinFeng; He, ZhenYu; Huang, ShuaiHui; Chen, Hao; Wang, WenZhong; Pourpanah, Farhad

doi:10.1007/s13042-021-01319-3

Fuzzy measure with regularization for gene selection and cancer prediction

Original Article
Published: 20 April 2021

Volume 12, pages 2389–2405, (2021)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

JinFeng Wang ORCID: orcid.org/0000-0002-1246-4617¹,
ZhenYu He¹,
ShuaiHui Huang¹,
Hao Chen¹,
WenZhong Wang² &
…
Farhad Pourpanah³

375 Accesses
7 Citations
1 Altmetric
Explore all metrics

Abstract

Dealing with high-dimensional gene expression data is a challenging issue, and it is crucial to select multiple informative subsets of genes for cancer classification. In this regard, many statistical and machine learning methods with regulations have been developed. However, these methods neglected the epistasis, i.e., some genes may cover or affect other genes. In this article, we propose a fuzzy measure with regularization, which adopts L₁ and L_1/2 norms for sparse solutions, known as FMR, to describe the interaction between genes. Regularization with L₁ and L_1/2 can obtain a series of sparse solutions which help solving fuzzy measure quicker than traditional methods, such as Genetic Algorithm. FMR obtains a subset of genes corresponding to the fewest nonzero fuzzy measure values, and consequently, selects the important gene(s) according to the frequency of appearance in the selected gene subsets. Besides, three base classifiers, including SVM, KNN and DBN, are employed as underlying models to verify the effectiveness of the selected subset(s) of genes. Experimental results indicate that the selected genes by FMR are consistent with several clinical studies. In addition, it can produce comparable results in terms of accuracy as compared with other methods reported in the literature. The codes used in this article are freely available at: https://github.com/wangphoenix/ICMLC.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Penalty term based suitable fuzzy intuitionistic possibilistic clustering: analyzing high dimensional gene expression cancer database

Article 26 September 2020

S. R. Kannan, Esha Kashyap, … Tzung-Pei Hong

Dimension Reduction of Gene Expression Data for Designing Optimized Rule Base Classifier

Feature selection using neighborhood uncertainty measures and Fisher score for gene expression data classification

Article 12 June 2023

Jiucheng Xu, Kanglin Qu, … Xiangru Meng

Notes

http://genomics-pubs.princeton.edu/oncology.

References

Gayathri BM, Sumathi CP, Santhanam T (2013) Breast cancer diagnosis using machine learning algorithms-a survey. Int J Distrib Parallel Syst 4(3):105–112
Article Google Scholar
Kharya S (2012) Using data mining techniques for diagnosis and prognosis of cancer disease. Int J ComputSciEngInfTechnol 2(2):55–66
Google Scholar
Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis DI (2015) Machine learning applications in cancer prognosis and prediction. ComputStructBiotechnol J 13:8–17
Google Scholar
Shajahaan SS, Shanthi S, Manochitra V (2013) Application of data mining techniques to model breast cancer data. Int J EmergTechnolAdvEng 3(11):362–369
Google Scholar
Shrivastava SS, Sant A, Aharwal RP (2013) An overview on data mining approach on breast cancer data. Int J AdvComput Res 3(4):256–262
Google Scholar
Alonso-González CJ, Moro-Sancho QI, Simon-Hurtado A, Varela-Arrabal R (2012) Microarray gene expression classification with few genes: criteria to com- bine attribute selection and classification methods. Expert SystAppl 39:7270–7280
Article Google Scholar
Cui Y, Zheng CH, Yang J, Sha W (2013) Sparse maximum margin discriminant analysis for feature extraction and gene selection on gene expression data. ComputBiol Med 43:933–941
Article Google Scholar
Kalina J (2014) Classification methods for high-dimensional genetic data. Biocybern Biomed Eng 34:10–18
Article Google Scholar
Piao Y, Piao M, Park K, Ryu KH (2012) An ensemble correlation-based gene selection algorithm for cancer classification with gene expression data. Bioinformatics 28:3306–3315
Article Google Scholar
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537
Article Google Scholar
Cawley GC, Talbot NLC (2006) Gene selection in cancer classification using sparse logistic regression with Bayesian regularization. Bioinformatics 22(19):2348–2355
Article Google Scholar
Zhou LT, Cao YH, Lv LL et al (2017) Feature selection and classification of urinary mRNA microarray data by iterative random forest to diagnose renal fibrosis: a two-stage study. Sci Rep 7:39832
Article Google Scholar
Zhao G, Wu Y (2016) Feature subset selection for cancer classification using weight local modularity. Sci Rep 6:34759. https://doi.org/10.1038/srep34759
Article Google Scholar
Huang HH, Liu XY, Liang Y (2016) Feature selection and cancer classification via sparse logistic regression with the hybrid L_1/2+2 regularization. PLoS ONE 11(5):e0149675
Article Google Scholar
Jayasurya L, Krishna Anand S (2016) Feature selection for microarray data using WGCNA based fuzzy forest in map reduce paradigm. Indian J SciTechnol. https://doi.org/10.17485/ijst/2016/v9i48/107971
Article Google Scholar
Algamal ZY, Lee MH (2015) Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification. Expert SystAppl 42(23):9326–9332
Article Google Scholar
Xu Z, Chang X, Xu F et al (2012) L_1/2 regularization: a thresholding representation theory and a fast solver. IEEE Trans Neural Netw 23(7):1013–1027
Article Google Scholar
Gao L, Ye M, Lu X et al (2017) Hybrid method based on information gain and support vector machine for gene selection in cancer classification. Genomics Proteomics Bioinform 15(6):389–395
Article Google Scholar
Yang KJ, Cai Z, Li J et al (2006) A stable gene selection in microarray data analysis. BMC Bioinform 7(1):228–228
Article Google Scholar
Liang Y, Liu C, Luan X et al (2013) Sparse logistic regression with a L1/2 penalty for gene selection in cancer classification. BMC Bioinform 14(1):198–198
Article Google Scholar
Yuan M, Yang Z, Ji G et al (2019) Partial maximum correlation information: a new feature selection method for microarray data classification. Neurocomputing 323:231–243. https://doi.org/10.1016/j.neucom.2018.09.084
Article Google Scholar
Li C, Li H (2008) Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics 24(9):1175–1182
Article Google Scholar
Smith V, Forte S, Jordan MI, et al (2015) L₁-regularized distributed optimization: a communication-efficient primal-dual framework. https://arxiv.org/pdf/1512.04011v1.pdf
Yuan GX, Ho CH, Lin CJ (2012) An improved GLMNET for L₁-regularized logistic regression. J Mach Learn Res 13:1999–2030
MathSciNet MATH Google Scholar
Sun Y, Lu C, Li X (2018) The cross-entropy based multi-filter ensemble method for gene selection. Genes 9(5):258
Article Google Scholar
Jain I, Jain VK, Jain R (2018) Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification. Appl Soft Comput 62:203–215
Article Google Scholar
Zou H (2006) The adaptive LASSO and its oracle properties J. Am Stat Assoc (Taylor & Francis) 101:1418–1429
Article MathSciNet MATH Google Scholar
Meinshausen N, Yu B (2009) LASSO-type recovery of sparse representations for high-dimensional data. Ann Stat JSTOR 37:246–270
MathSciNet MATH Google Scholar
Wang Z (1985) Asymptotic structural characteristics of fuzzy measure and their applications. Fuzzy Sets Syst 16(3):277–290
Article MathSciNet MATH Google Scholar
Chen R, Guo S, Wang X et al (2019) Fusion of multi-RSMOTE with fuzzy integral to classify bug reports with an imbalanced distribution. IEEE Trans Fuzzy Syst 27:2406–2420
Article Google Scholar
Zhai J, Zhou X, Zhang S et al (2019) Ensemble RBM-based classifier using fuzzy integral for big data classification. Int J Mach Learn Cybern 10:3327–3337
Article Google Scholar
Grabisch M (2003) The symmetric Sugeno integral. Fuzzy Sets Syst 139:473–490
Article MathSciNet MATH Google Scholar
Wang Z, Guo HF (2003) A new genetic algorithm for nonlinear multiregressions based on generalized Fuzzy integrals. IEEE IntConf Fuzzy Syst 2:819–821
Google Scholar
Murofushi T, Sugeno M, Machida M (1994) Non monotonic fuzzy measures and the fuzzy integral. Fuzzy Sets Syst 64:73–86
Article MATH Google Scholar
Wang Z (2003) A new genetic algorithm for nonlinear multiregressions based on generalized Choquet integrals. In: Proc. 12th IEEE intern. conf. fuzzy systems, vol 2, pp 819–821
Wang W, Wang ZY, Klir GJ (1998) Genetic algorithm for determining fuzzy measures from data. J Intell Fuzzy Syst 6:171–183
Google Scholar
Leung KS, Wong ML, Lam W, Wang Z, Xu K (2002) Learning nonlinear multiregression networks based on evolutionary computation. IEEE Trans Syst Man Cybern Part B 32(5):630–644
Article Google Scholar
Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Caki F (eds) Second international symposium on information theory, Budapest, pp 267–281
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Article MathSciNet MATH Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the LASSO. J R Stat Soc B 58:267–288
MathSciNet MATH Google Scholar
Donoho DL, Huo X (2001) Uncertainty principles and ideal atomic decomposition. IEEE Trans Inf Theory 47:2845–2862
Article MathSciNet MATH Google Scholar
Donoho DL, Elad E (2003) Maximal sparsity representation via l1 minimization. ProcNatlAcalSci 100:2197–2202
Article MATH Google Scholar
Chen S, Donoho DL, Saunders M (2001) Atomic decomposition by basis pursuit. SIAM Rev 43:129–159
Article MathSciNet MATH Google Scholar
Xu ZB, Hai Z, Yao W et al (2010) L_1/2 regularization. Sci China InfSci 53(6):1159–1169
Article Google Scholar
Shipp MA, Ross KN, Tamayo P et al (2002) Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8(1):68–74
Article Google Scholar
Singh D, Febbo PG, Ross K et al (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209
Article Google Scholar
Daubechies I, Devore R, Fornasier M (2010) Iteratively reweighted least squares minimization for sparse recovery. Commun Pure Appl Math 63(1):1–38
Article MathSciNet MATH Google Scholar
Alon U, Barkai N, Notterman DA et al (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. ProcNatlAcadSci USA 96(12):6745–6750
Article Google Scholar
Freije WA, Edmundo Castro-Vargas F, Fang Z et al (2004) Gene expression profiling of gliomas strongly predicts survival. Can Res 64(18):6503–6510
Article Google Scholar
Affymetrix (2001) Microarray suite user’s guide version 5.0. Affymetrix Inc., Santa Clara
Google Scholar
Bolstad BM, Irizarry RA, Astrand M et al (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2):185–193
Article Google Scholar
Wu Z, Irizarry RA, Gentleman R et al (2004) A model based background adjustment for oligonucleotide expression arrays. J Am Stat Assoc 99(468):909–917
Article MathSciNet MATH Google Scholar
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the fourteenth international joint conference on artificial intelligence, Montreal, QC, Canada, pp 1137–1143
Lin Y, Sharma S, John MS (2014) CCL21 Cancer immunotherapy. Cancers 6:1098–1110
Article Google Scholar
Qu K, Wang Z, Fan H et al (2017) MCM7 promotes cancer progression through cyclin D1-dependent signaling and serves as a prognostic marker for patients with hepatocellular carcinoma. Cell Death Dis 8(2):e2603. https://doi.org/10.1038/cddis.2016.352
Article Google Scholar
Hill R, Madureira PA, Ferreira B et al (2017) TRIB2 confers resistance to anti-cancer therapy by activating the serine/threonine protein kinase AKT. Nat Commun 8:14687
Article Google Scholar
Kohno M, Hasegawa H, Inoue A, Muraoka M, Miyazaki T, Oka K, Yasukawa M (2006) Identification of N-arachidonylglycine as the endogenous ligand for orphan G-protein-coupled receptor GPR18. BiochemBiophys Res Commun 347(3):827–832
Article Google Scholar
Finlay DB, Joseph WR, Grimsey NL, Glass M (2016) GPR18 undergoes a high degree of constitutive trafficking but is unresponsive to N-arachidonoyl glycine. PeerJ. https://doi.org/10.7717/peerj.1835
Article Google Scholar
Zhang L, Qiu C, Yang L et al (2019) GPR18 expression on PMNs as biomarker for outcome in patient with sepsis. Life Sci 217:49–56
Article Google Scholar
Ding WH, Ren KW, Yue C et al (2008) Association between three genetic variants in kallikrein 3 and prostate cancer risk. Biosci Rep. https://doi.org/10.1042/BSR20181151
Wang J, Koo KM, Wang Y et al (2018) ‘Mix-to-Go’ silver colloidal strategy for prostate cancer molecular profiling and risk prediction. Anal Chem 90:12698–12705
Article Google Scholar
Munkley J, McClurg UL, Livermore KE et al (2017) The cancer-associated cell migration protein TSPAN1 is under control of androgens and its upregulation increases prostate cancer cell migration. Sci Rep 7:5249. https://doi.org/10.1038/s41598-017-05489-5
Article Google Scholar
Albitar M, Ma W, Lund L et al (2016) Predicting prostate biopsy results using a panel of plasma and urine biomarkers combined in a scoring system. J Cancer 7(3):297–303
Article Google Scholar
Willbold R, Wirth K, Martini T, Holger S, Wittig R (2019) Excess hepsinproteolytic activity limits oncogenic signaling and induces ER stress and autophagy in prostate cancer cells. Cell Death Dis. https://doi.org/10.1038/s41419-019-1830-8
Article Google Scholar
Qi Y, Li Y, Zhang Y, Zhang L, Wang Z (2015) IFI6 inhibits apoptosis via mitochondrial-dependent pathway in dengue virus 2 infected vascular endothelial cells. PLoS ONE 10(8):e0132743
Article Google Scholar
Blake RR, Ohlson MB, Eitson JL et al (2018) A CRISPR screen identifies IFI6 as an ER-resident interferon effector that blocks flavivirus replication. Nat Microbiol 3:1214–1223
Article Google Scholar
Choi YY, Cho HD, Park DG, Kim SY, Baek MJ (2008) Expression of hypoxia-inducible factor-1α and vascular endothelial growth factor in colon cancer: relationship to the prognosis and tumor markers. Ann Coloproctol 24(5):337
Google Scholar
Mia HJ, Qi XG (2010) Role of cxcl8/cxcr1 in the metastasis of human colon cancer. World Chin J Digestol 18(22):2379
Article Google Scholar
Zhao Q, Jiang C, Gao Q, Zhang Y, Wang G, Chen X et al (2020) Gene expression and methylation profiles identified cxcl3 and cxcl8 as key genes for diagnosis and prognosis of colon adenocarcinoma. J Cell Physiol 235:4902–4912
Article Google Scholar
Garrido A, Fromentin A, Bonnotte B, Favre N, Moutet M, Arrigo AP et al (1998) Heat shock protein 27 enhances the tumorigenicity of immunogenic rat colon carcinoma cell clones. Can Res 58(23):5495–5949
Google Scholar
Tsuruta, (2008) Heat shock protein 27, a novel regulator of 5-fluorouracil resistance in colon cancer. Oncol Rep 20(5):1165–1172. https://doi.org/10.3892/or_00000125
Article Google Scholar
Donahoe PK, Fuller AF, Scully RE, Guy SR, Budzik GP (1981) Mullerian inhibiting substance inhibits growth of a human ovarian cancer in nude mice. Ann Surg 194(4):472–480
Article Google Scholar
Masahiro S, Hideomi H, Hiroyuki H, Suzuki SO, Masaki T, Tetsuro A et al (2019) Upregulation of Annexin A1 in reactive astrocytes and its subtle induction in microglia at the boundaries of human brain infarcts. J NeuropatholExpNeurol 78(10):961–970. https://doi.org/10.1093/jnen/nlz079
Article Google Scholar
Gao YF, Liu JY, Mao XY et al (2020) LncRNA FOXD1-AS1 acts as a potential oncogenic biomarker in glioma. CNS NeurosciTher 26:66–75. https://doi.org/10.1111/cns.13152
Article Google Scholar
Kitamura K, Sakata J, Kangawa K, Kojima M, Matsuo H, Eto T (1993) Cloning and characterization of cDNA encoding a precursor for human adrenomedullin. BiochemBiophys Res Commun 194(2):720–725
Article Google Scholar
Rodrigues-Pinto R, Ward L, Humphreys M et al (2018) Human notochordal cell transcriptome unveils potential regulators of cell function in the developing intervertebral disc. Sci Rep 8(1):12866. https://doi.org/10.1038/s41598-018-31172-4
Article Google Scholar

Download references

Acknowledgements

We would also like to thank Prof. Xi-Zhao Wang in Shenzhen University for his support and revision.

Funding

Funding was provided by the Science and Technology Planning Project of Guangdong Province of China (Grant no. 2017A040406023).

Author information

Authors and Affiliations

College of Mathematics and Informatics, South China Agricultural University, Guangzhou, 510642, China
JinFeng Wang, ZhenYu He, ShuaiHui Huang & Hao Chen
College of Economics and Management, South China Agricultural University, Guangzhou, 510642, China
WenZhong Wang
College of Mathematics and Statistics, Guangdong Key Lab. of Intelligent Information Processing, Shenzhen University, Shenzhen, 518060, China
Farhad Pourpanah

Authors

JinFeng Wang
View author publications
You can also search for this author in PubMed Google Scholar
ZhenYu He
View author publications
You can also search for this author in PubMed Google Scholar
ShuaiHui Huang
View author publications
You can also search for this author in PubMed Google Scholar
Hao Chen
View author publications
You can also search for this author in PubMed Google Scholar
WenZhong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Farhad Pourpanah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to JinFeng Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, J., He, Z., Huang, S. et al. Fuzzy measure with regularization for gene selection and cancer prediction. Int. J. Mach. Learn. & Cyber. 12, 2389–2405 (2021). https://doi.org/10.1007/s13042-021-01319-3

Download citation

Received: 26 August 2020
Accepted: 30 March 2021
Published: 20 April 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s13042-021-01319-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fuzzy measure with regularization for gene selection and cancer prediction

Abstract

Access this article

Similar content being viewed by others

Penalty term based suitable fuzzy intuitionistic possibilistic clustering: analyzing high dimensional gene expression cancer database

Dimension Reduction of Gene Expression Data for Designing Optimized Rule Base Classifier

Feature selection using neighborhood uncertainty measures and Fisher score for gene expression data classification

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fuzzy measure with regularization for gene selection and cancer prediction

Abstract

Access this article

Similar content being viewed by others

Penalty term based suitable fuzzy intuitionistic possibilistic clustering: analyzing high dimensional gene expression cancer database

Dimension Reduction of Gene Expression Data for Designing Optimized Rule Base Classifier

Feature selection using neighborhood uncertainty measures and Fisher score for gene expression data classification

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation