Skip to main content
Log in

Uncertainty measurement for a gene space based on class-consistent technology: an application in gene selection

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

With the development of data mining, artificial intelligence, neural network, expert system and machine learning, information system (i-system) becomes more and more important. If the objects, attributes and information values in an i-system are replaced by cells, genes and gene expression values, respectively, then the i-system is said to be a gene space. Because gene expression data is characterized by small samples, high dimension and noise, there is considerable uncertainty in a gene space. Traditional machine learning and statistical methods are often powerless to a gene space. Granular computing (GrC) can effectively deal with various uncertainties. This paper studies the uncertainty measurement of gene space based on the class-consistent technology and discusses its application in gene selection from the perspective of GrC. A class-consistent relation between cells in a gene space is first established by the gene expression values of cells on the basis of class-consistent technology. Then, the information granules (i-granules) are obtained from a gene space by using the class-consistent relation. Next, two metrics (information granularity and information entropy) to measure the uncertainty of gene space are defined and their properties are also investigated. The results of numerical experiments and statistical tests verify their effectiveness. Furthermore, as their application to gene space, two gene selection algorithms are proposed. Finally, the clustering experiments and statistical tests on 16 gene spaces show that the designed gene selection algorithms outperform some state-of-the-art feature selection algorithms in terms of three clustering performance indicators.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Alexander I, Tapani R (2010) Practical approaches to principal component analysis in the presence of missing values. J Mach Learn Res 11:1957–2000

    MathSciNet  MATH  Google Scholar 

  2. Biase FH, Cao X, Zhong S (2014) Cell fate inclination within 2-cell and 4-cell mouse embryos revealed by single-cell RNA sequencing. Genome Res 24:1787–1796

    Article  Google Scholar 

  3. Bommert A, Welchowski T, Schmid M, Rahnenführer J (2022) Benchmark of filter methods for feature selection in high-dimensional gene expression survival data. Briefings in Bioinformatics 23:bbab354

    Article  Google Scholar 

  4. Cament L A, Castillo L E, Perez JP, Galdames FJ, Perez CA (2014) Fusion of local normalization and Gabor entropy weighted features for face identification. Pattern Recognit 47(2):568–577

    Article  Google Scholar 

  5. Chung W, Eum HH, Lee HO, Lee KM, Lee HB, Kim KT, Ryu HS, Kim S, Lee JE, Park YH, Kan Z, Han W, Park WY (2017) Single-cell RNA-seq enables comprehensive tumour and immune cell profling in primary breast cancer. Nat Commun 8:15081

    Article  Google Scholar 

  6. Dai JH, Hu H, Wu WZ, Qian YH, Huang DB (2018) Maximal discernibility-pair-based approach to attribute reduction in fuzzy rough sets. IEEE Trans Fuzzy Syst 26(4):2175–2187

    Article  Google Scholar 

  7. Dai JH, Hu H, Zheng GJ, Hu QH, Han HF, Shi H (2016) Attribute reduction in interval-valued information systems based on information entropies. Front Inf Technol Electron Eng 17(9):919–928

    Article  Google Scholar 

  8. Delgado A, Romero I (2016) Environmental conflict analysis using an integrated grey clustering and entropy-weight method: a case study of a mining project in Peru. Environ Model Softw 77:108–121

    Article  Google Scholar 

  9. Dai JH, Tian HW (2013) Entropy measures and granularity measures for set valued information systems. Inf Sci 240:72–82

    Article  MathSciNet  MATH  Google Scholar 

  10. Dai JH, Wang WT, Xu Q (2013) An uncertainty measure for incomplete decision tables and its applications. IEEE Trans Cybern 43(4):1277–1289

    Article  Google Scholar 

  11. Deng Q, Ramskld D, Reinius B, Sandberg R (2014) Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science 343:193–196

    Article  Google Scholar 

  12. Engel I, Seumois G, Chavez L, Samaniego-Castruita D, White B, Chawla A, Mock D, Vijayanand P, Kronenberg M (2016) Innatelike functions of natural killer T cell subsets result from highly divergent gene programs. Nat Immunol 17:728–739

    Article  Google Scholar 

  13. Fujita H, Gaeta A, Loia V, Orciuoli F (2019) Resilience analysis of critical infrastructures: a cognitive approach based on granular computing. IEEE Trans Cybern 49(5):1835–1848

    Article  Google Scholar 

  14. Huang ZH, Li JJ Discernibility measures for fuzzy β-covering and their application, IEEE Transactions on Cybernetics

  15. Goolam M, Scialdone A, Graham SJ, Macaulay IC, Jedrusik A, Hupalowska A, Voet T, Marioni JC, Zernicka-Goetz M (2016) Heterogeneity in Oct4 and Sox2 targets biases cell fate in 4-cell mouse embryos. Cell 165:61–74

    Article  Google Scholar 

  16. Hu M, Tsang ECC, Guo YT, Xu WH Fast and robust attribute reduction based on the separability in fuzzy decision systems. IEEE Transactions on Cybernetics

  17. Hempelmann CF, Sakoglu U, Gurupur VP, Jampana S (2016) An entropy-based evaluation method for knowledge bases of medical information systems. Expert Syst Appl 46:262–273

    Article  Google Scholar 

  18. Hu QH, Yu DR, Liu J, Wu C (2008) Neighborhood rough set based heterogeneous feature subset selection. Information Sciences 178(18):3577–3594

    Article  MathSciNet  MATH  Google Scholar 

  19. Kolodziejczyk AA, Kim JK, Tsang JC, Ilicic T, Henriksson J, Natarajan KN, Tuck AC, Gao X, Buhler M, Liu P, Marioni JC, Teichmann SA (2015) Single cell RNA-sequencing of pluripotent states unlocks modular transcriptional variation. Cell Stem Cell 17:471–485

    Article  Google Scholar 

  20. Li ZW, Liu XF, Dai JH, Chen JL, Fujita H (2020) Measures of uncertainty based on Gaussian kernel for a fully fuzzy information system. Knowl-Based Syst 196:105791

    Article  Google Scholar 

  21. Liu KY, Li TY, Yang XB, Yang X, Liu D, Zhang PF, Wang J (2022) Granular cabin: An efficient solution to neighborhood learning in big data. Inf Sci 583:189–201

    Article  Google Scholar 

  22. Liu KY, Yang XB, Fujita H, Liu D, Qian YH (2019) An efficient selector for multi-granularity attribute reduction. Inf Sci 505:457–472

    Article  Google Scholar 

  23. Liu KY, Yang XB, Yu HL, Mi JS, Wang PX (2019) Rough set based semi-supervised feature selection via ensemble selector. Knowl-Based Syst 165:282–296

    Article  Google Scholar 

  24. Li ZW, Zhang PF, Ge X, Xie NX, Zhang GQ, Wen CF (2019) Uncertainty measurement for a fuzzy relation information system. IEEE Trans Fuzzy Syst 27:2338–2352

    Google Scholar 

  25. Patel AP, Tirosh I, Trombetta JJ, Shalek AK, Gillespie SM, Wakimoto H, Cahill DP, Nahed BV, Curry WT, Martuza RL, Louis DN, Rozenblatt O, Suva ML, Regev A, Bernstein BE (2014) Single-cell rna-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 344:401–1396

    Article  Google Scholar 

  26. Pawlak Z (1991) Rough sets: Theoretical aspects of reasoning about data. Kluwer Academic Publishers, Dordrecht

    Book  MATH  Google Scholar 

  27. Pollen AA, Nowakowski TJ, Shuga J, Wang XH, Leyrat AA, Lui JH, Li N, Szpankowski L, Fowler B, Chen P, Ramalingam N, Sun G, Thu M, Norris M, Lebofsky R, Toppani D, Kemp DW, Wong M, Clerkson B, Jones BN, Wu S, Knutsson L, Alvarado B, Wang J, Weaver LS, May AP, Jones RC, Unger MA, Kriegstein AR, West JA (2014) Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex. Nat Biotechnol 32:1053–1058

    Article  Google Scholar 

  28. Qian YH, Liang JY, Wu WZ, Dang CY (2011) Information granularity in fuzzy binary GrC model. IEEE Trans Fuzzy Syst 19:253–264

    Article  Google Scholar 

  29. Ramskold D, Luo S, Wang YC, Li R, Deng Q, Faridani OR, Daniels GA, Khrebtukova I, Loring JF, Laurent LC, Schroth GP, Sandberg R (2012) Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nat Biotech 30:777–782

    Article  Google Scholar 

  30. Robert JK, Lee G, Li JW, Genshaft AS, Kazer SW, Payer KR, Borrajo J, Blainey PC, Irvine DJ, Shalek AK, Manalis SR (2016) A microfuidic platform enabling single-cell RNA-seq of multigenerational lineages. Nat Commun 7:10220

    Article  Google Scholar 

  31. Song Y, Botvinnik OB, Lovci MT, Kakaradov B, Liu P, Xu JL, Yeo GW (2017) Single-cell alternative splicing analysis with expedition reveals splicing dynamics during neuron diferentiation. Molecular Cell 67:148–161

    Article  Google Scholar 

  32. Song QJ, Jiang HY, Liu J (2017) Feature selection based on FDA and F-score for multi-class classification. Expert Syst Appl 81:22–27

    Article  Google Scholar 

  33. Sheng J, Li WV (2021) Selecting gene features for unsupervised analysis of single-cell gene expression data. Briefings in Bioinformatics 22:bbab295

    Article  Google Scholar 

  34. Sasagawa Y, Nikaido T, Hayashi T, Danno H, Uno KD, Imai T, Ueda HR (2013) Quartz-Seq: a highly reproducible and sensitive single-cell RNA sequencing method, reveals nongenetic geneexpression heterogeneity. Genome Biology 14:3097

    Article  Google Scholar 

  35. Sharma A, Lysenko A, Boroevich K A, Vans E, Tsunoda T (2021) DeepFeature: feature selection in non-image data using convolutional neural network. Briefings in Bioinformatics 22:bbab297

    Article  Google Scholar 

  36. Sharma A, Rani R (2019) C-HMOSHSSA: Gene selection for cancer classification using multi-objective meta-heuristic and machine learning methods. Comput Methods Programs Biomed 178:219–235

    Article  Google Scholar 

  37. Saqlain SM, Sher M, Shah FA, Khan I, Ashraf MU, Awais M, Ghani A (2019) Fisher score and Matthews correlation coefficient-based feature subset selection for heart disease diagnosis using support vectormachines. Knowl Inf Syst 58:139–167

    Article  Google Scholar 

  38. Singh S, Shreevastava S, Som T, Somani G (2020) A fuzzy similarity-based rough set approach for attribute selection in set-valued information systems. Soft Comput 24:4675–4691

    Article  MATH  Google Scholar 

  39. Sun L, Zhang XY, Qian YH, Xu JC, Zhang SG, Tian Y (2019) Joint neighborhood entropy-based gene selection method with fisher score for tumor classification. Appl Intell 49:1245–1259

    Article  Google Scholar 

  40. Ting DT, Wittner BS, Ligorio M, Jordan NV, Shah AM, Miyamoto DT, Aceto N, Bersani F, Brannigan BW, Xega K, Ciciliano JC, Zhu H, MacKenzie OC, Trautwein J, Arora KS, Shahid M, Ellis HL, Qu N, Haber DA, Single-cell R N A (2014) sequencing identifies extracellular matrix gene expression by pancreatic circulating tumor cells. Cell Reports 8:1905– 1918

    Article  Google Scholar 

  41. Treutlein B, Brownfeld DG, Wu AR, Nef NF, Mantalas GL, Espinoza FH, Desai TJ, Krasnow MA, Quake SR (2014) Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature 509:371–375

    Article  Google Scholar 

  42. Tan AH, Shi SW, Wu WZ, Li JJ, Pedrycz W. Granularity and entropy of intuitionistic fuzzy information and their applications. IEEE Transactions on Cybernetics

  43. Usoskin D, Furlan A, Islam S, Abdo H, Lnnerberg P, Lou D, Hjerling J, Haeggstrm J, Kharchenko O, Kharchenko PV, Linnarsson S, Ernfors P (2015) Unbiased classifcation of sensory neuron types by large-scale single-cell RNA sequencing. Nat Neurosci 18:53–145

    Article  Google Scholar 

  44. Wang YB, Chen XJ, Dong K (2019) Attribute reduction via local conditional entropy. Int J Mach Cybern 10(12):3619–3634

    Article  Google Scholar 

  45. Wang CZ, Huang Y, Shao MW, Hu QH, Chen DG (2020) Feature selection based on neighborhood self-information. IEEE Trans Cybern 50(9):4031–4042

    Article  Google Scholar 

  46. Wu Y, Zhang K (2020) Tools for the analysis of high-dimensional single-cell RNA sequencing data. Nat Rev Nephrol 16:408–421

    Article  Google Scholar 

  47. Xie SD, Wang YX (2014) Construction of tree network with limited delivery latency in homogeneous wireless sensor networks. Wirel Pers Commun 78(1):231–246

    Article  Google Scholar 

  48. Yan L, Yang M, Guo H, Yang L, Wu J, Li R, Liu P, Lian Y, Zheng X, Yan J, Huang J, Li M, Wu X, Wen L, Lao K, Li R, Qiao J, Tang F (2013) Single-cell RNA-Seq profling of human preimplantation embryos and embryonic stem cells. Nat Struct Mol Biol 20:1131–1139

    Article  Google Scholar 

  49. Yang Y, Huh R, Houston WC, Lin Y, Michael IL, Li Y (2019) SAFE-clustering: Single-cell Aggregated (from Ensemble) clustering for single-cell RNA-seq data. Bioinformatics 35:1269–1277

    Article  Google Scholar 

  50. Zadeh LA (1997) Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets Syst 90:111–127

    Article  MathSciNet  MATH  Google Scholar 

  51. Zhang GQ, Li ZW, Wu WZ, Liu XF, Xie NX (2018) Information structures and uncertainty measures in a fully fuzzy information system. Int J Approx Reason 101:119–149

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the editors and the anonymous reviewers for their valuable comments and suggestions, which have helped immensely in improving the quality of the paper. This work is supported by Doctoral Research Start Project (CZ2021YJRC01), Natural Science Foundation of Guangxi (2020GXNSFAA159155, AD19245102) and Special Scientific Research Project of Young Innovative Talents in Guangxi (2019AC20052).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Qinli Zhang or Ching-Feng Wen.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Z., Zhang, Q., Wang, P. et al. Uncertainty measurement for a gene space based on class-consistent technology: an application in gene selection. Appl Intell 53, 5416–5436 (2023). https://doi.org/10.1007/s10489-022-03657-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03657-3

Keywords

Navigation