Skip to main content
Log in

Joint neighborhood entropy-based gene selection method with fisher score for tumor classification

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Tumor classification is one of the most vital technologies for cancer diagnosis. Due to the high dimensionality, gene selection (finding a small, closely related gene set to accurately classify tumor) is an important step for improving gene expression data classification performance. Traditional rough set model as a classical attribute reduction method deals with discrete data only. As for the gene expression data containing real-value or noisy data, they are usually employed by a discrete preprocessing, which may result in poor classification accuracy. In this paper, a novel neighborhood rough sets and entropy measure-based gene selection with Fisher score for tumor classification is proposed, which has the ability of dealing with real-value data whilst maintaining the original gene classification information. First, the Fisher score method is employed to eliminate irrelevant genes to significantly reduce computation complexity. Next, some neighborhood entropy-based uncertainty measures are investigated for handling the uncertainty and noisy of gene expression data. Moreover, some of their properties are derived and the relationships among these measures are established. Finally, a joint neighborhood entropy-based gene selection algorithm with the Fisher score is presented to improve the classification performance of gene expression data. The experimental results under an instance and several public gene expression data sets prove that the proposed method is very effective for selecting the most relevant genes with high classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Apolloni J, Leguizamon G, Alba E (2016) Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments. Appl Soft Comput 38:922–932

    Article  Google Scholar 

  2. Aziz R, Verma CK, Srivastava N (2016) A fuzzy based feature selection from independent component subspace for machine learning classification of microarray data. Genom Data 8:4–15

    Article  Google Scholar 

  3. Bhola A, Singh S (2018) Gene selection using high dimensional gene expression data: An appraisal. Curr Bioinform 13(2):225–233

    Article  Google Scholar 

  4. Chen HM, Li TR, Cai Y, Luo C, Fujitac H (2016) Parallel attribute reduction in dominance-based neighborhood rough set. Inf Sci 373:351–368

    Article  Google Scholar 

  5. Chen YM, Zhang ZJ, Zheng JZ, Ma Y, Xue Y (2017) Gene selection for tumor classification using neighborhood rough sets and entropy measures. J Biomed Inform 67:59–68

    Article  Google Scholar 

  6. Das AK, Sengupta S, Bhattacharyya S (2018) A group incremental feature selection for classification using rough set theory based genetic algorithm. Appl Soft Comput 65:400–411

    Article  Google Scholar 

  7. Dong H, Li T, Ding R, Sun J (2018) A novel hybrid genetic algorithm with granular information for feature selection and optimization. Appl Soft Comput 65:33–46

    Article  Google Scholar 

  8. Fan XD, Zhao WD, Wang CZ, Huang Y (2018) Attribute reduction based on max-decision neighborhood rough set model. Knowl-Based Syst 151:16–23

    Article  Google Scholar 

  9. Garcia-Torres M, Gomez-Vela F, Melian-Batista B, Moreno-Vega JM (2016) High-dimensional feature selection via feature grouping: A variable neighborhood search approach. Inf Sci 326:102–118

    Article  MathSciNet  Google Scholar 

  10. Greenman CD (2012) Haploinsufficient gene selection in cancer. Science 337(6090):47–48

    Article  Google Scholar 

  11. Hancer E, Xue B, Zhang MJ (2018) Differential evolution for filter feature selection based on information theory and feature ranking. Knowl-Based Syst 140:103–119

    Article  Google Scholar 

  12. Hasanloei MAV, Sheikhpour R, Sarram MA, Sheikhpour E, Sharifi H (2018) A combined Fisher and Laplacian score for feature selection in QSAR based drug design using compounds with known and unknown activities. J Comput-Aided Mater 32(1):375–384

    Article  Google Scholar 

  13. Hu L, Gao WF, Zhao K, Zhang P, Wang F (2018) Feature selection considering two types of feature relevancy and feature interdependency. Expert Syst Appl 93:423–434

    Article  Google Scholar 

  14. Hu J, Pedrycz W, Wang GY, Wang K (2016) Rough sets in distributed decision information systems. Knowl-Based Syst 94:13–22

    Article  Google Scholar 

  15. Hu QH, Pan W, An S, Ma PJ, Wei JM (2010) An efficient gene selection technique for cancer recognition based on neighborhood mutual information. Int J Mach Learn Cyb 1(1-4):63–74

    Article  Google Scholar 

  16. Hu QH, Yu DR, Liu JF, Wu CX (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178(18):3577–3594

    Article  MathSciNet  MATH  Google Scholar 

  17. Huang XJ, Zhang L, Wang BJ, Li FZ, Zhang Z (2018) Feature clustering based support vector machine recursive feature elimination for gene selection. Appl Intell 48(2):594–607

    Article  Google Scholar 

  18. Islam AKMT, Jeong BS, Bari ATMG, Lim CG, Jeon SH (2015) MapReduce based parallel gene selection method. Appl Intell 42(1):147–156

    Article  Google Scholar 

  19. Ivica S, Jana K, Dragi K, Saso D (2018) HMC-ReliefF: Feature ranking for hierarchical multi-label classification. Comput Sci Inf Syst 15(1):187–209

    Article  Google Scholar 

  20. Li JG, Su L, Pang ZN (2015) A filter feature selection method based on MFA score and redundancy excluding and it’s application to tumor gene expression data analysis. Interdiscip Sci 7(3):391–396

    Article  Google Scholar 

  21. Lin HY (2018) Reduced gene subset selection based on discrimination power boosting for molecular classification. Knowl-Based Syst 142:181–191

    Article  Google Scholar 

  22. Liu Y, Huang WL, Jiang YL, Zeng ZY (2014) Quick attribute reduct algorithm for neighborhood rough set model. Inf Sci 271:65–81

    Article  MathSciNet  MATH  Google Scholar 

  23. Lin YJ, Hu QH, Liu JH, Chen JK, Duan J (2016) Multi-label feature selection based on neighborhood mutual information. Appl Soft Comput 38:244–256

    Article  Google Scholar 

  24. Lyu HQ, Wan MX, Han JQ, Liu RL, Wang C (2017) A filter feature selection method based on the maximal information coefficient and gram-schmidt orthogonalization for biomedical data mining. Comput Biol Med 89:264–274

    Article  Google Scholar 

  25. Pawlak Z (1982) Rough sets. Int J Comput Inform Sci 11(4):341–356

    Article  MATH  Google Scholar 

  26. Qian YH, Liang XY, Wang Q, Liang JY, Liu B, Skowronef A, Yao YY, Ma JM, Dang CY (2018) Local rough set: a solution to rough data analysis in big data. Int J Approx Reason 97:38–63

    Article  MathSciNet  MATH  Google Scholar 

  27. Ramos J, Castellanos-Garzon JA, de Paz JF, Corchado JM (2018) A data mining framework based on boundary-points for gene selection from DNA-microarrays: Pancreatic Ductal Adenocarcinoma as a case study. Eng Appl Artif Intel 70:92–108

    Article  Google Scholar 

  28. Sun L, Xu JC (2014) Information entropy and mutual information-based uncertainty measures in rough set theory. Appl Math Inform Sci 8(3):1973–1985

    Article  MathSciNet  Google Scholar 

  29. Sun L, Xu JC (2014) Feature selection using mutual information based uncertainty measures for tumor classification. Bio-Med Mater Eng 24:763–770

    Article  Google Scholar 

  30. Sun L, Xu JC, Xu TH (2014) Information entropy and information granulation-based uncertainty measures in incomplete information systems. Appl Math Inform Sci 8(3):2073–2083

    Article  MathSciNet  Google Scholar 

  31. Sun L, Xu JC, Tian Y (2012) Feature selection using rough entropy-based uncertainty measures in incomplete decision systems. Knowl-Based Syst 36:206–216

    Article  Google Scholar 

  32. Sun L, Xu JC, Wang W, Yin Y (2016) Locally linear embedding and neighborhood rough set-based gene selection for gene expression data classification. Genet Mol Res 15(2):15038990. gmr

    Google Scholar 

  33. Sun L, Xu JC, Yin Y (2015) Principal component-based feature selection for tumor classification. Bio-Med Mater Eng 26:S2011–S2017

    Article  Google Scholar 

  34. Sun L, Zhang XY, Xu JC, Wang W, Liu RN (2018) A Gene selection approach based on the fisher linear discriminant and the neighborhood rough set. Bioengineered 9(1):144–151

    Article  Google Scholar 

  35. Sun SQ, Peng QK, Zhang XK (2016) Global feature selection from microarray data using Lagrange multipliers. Knowl-Based Syst 110:267–274

    Article  Google Scholar 

  36. The dataset is download from kent ridge bio-medical dataset. http://datam.i2r.a-star.edu.sg/datasets/krbd/

  37. The dataset is download from gene expression model selector. http://www.gems-system.org

  38. Urbanowicz RJ, Meeker M, La Cava W, Olsona RS, Moore JH (2018) Relief-based feature selection: introduction and review. J Biomed Inform 85:189–203

    Article  Google Scholar 

  39. Venkataramana L, Jacob SG, Ramadoss R (2018) Parallelized classification of cancer sub-types from gene expression profiles using recursive gene selection. Stud Inform Control 27(1):215–224

    Google Scholar 

  40. Wang CZ, He Q, Shao MW, Xu YY, Hu QH (2017) A unified information measure for general binary relations. Knowl-Based Syst 135:18–28

    Article  Google Scholar 

  41. Wang CZ, Hu QH, Wang XZ, Chen DG, Qian YH, Dong Z (2017) Feature selection based on neighborhood discrimination index. IEEE T Neur Net Lear 29(6):2986–2999

    MathSciNet  Google Scholar 

  42. Wang CZ, Qi YL, Shao MW, Hu QH, Chen DG, Qian YH, Lin YJ (2017) A fitting model for feature selection with fuzzy rough sets. IEEE T Fuzzy Syst 25(3):741–753

    Article  Google Scholar 

  43. Wang SQ, Kong W, Deng J, Gao S, Zeng WM (2018) Hybrid feature selection algorithm mRMR-ICA for cancer classification from microarray gene expression data. Comb Chem High T Scr 21(5):420–430

    Google Scholar 

  44. Wen LY, Min F, Wang SY (2017) A two-stage discretization algorithm based on information entropy. Appl Intell 47(3):1169–1185

    Article  Google Scholar 

  45. Xu FF, Miao DQ, Wei L (2009) Fuzzy-rough attribute reduction via mutual information with an application to cancer classification. Comput Math Appl 57(5):1010–1017

    Article  MATH  Google Scholar 

  46. Zhang BW, Min F, Ciucci D (2015) Representative- based classification through covering-based neighborhood rough sets. Appl Intell 43(3):840–854

    Article  Google Scholar 

  47. Zhang XH, Miao DQ, Liu CH, Le ML (2016) Constructive methods of rough approximation operators and multigranulation rough sets. Knowl-Based Syst 91:114–125

    Article  Google Scholar 

  48. Zhao H, Wang P, Hu QH (2016) Cost-sensitive feature selection based on adaptive neighborhood granularity with multi-level confidence. Inform Sciences 366:134–149

    Article  MathSciNet  Google Scholar 

  49. Zheng SF, Liu W X (2011) An experimental comparison of gene selection by lasso and dantzig selector for cancer classification. Comput Biol Med 41(10):1033–1040

    Article  Google Scholar 

Download references

Acknowledgments

This work was partially supported by the National Natural Science Foundation of China (Grants 61772176, 61402153, 61672332, 61370169, and 61472042), the China Postdoctoral Science Foundation (Grant 2016M602247), the Plan for Scientific Innovation Talent of Henan Province (Grant 184100510003), the Key Project of Science and Technology Department of Henan Province (Grants 182102210362), the Young Scholar Program of Henan Province (Grant 2017GGJS041), the Key Scientific and Technological Project of Xinxiang City (Grant CXGG17002), and the Ph.D. Research Foundation of Henan Normal University (Grants qd15132, qd15129).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiu-Cheng Xu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, L., Zhang, XY., Qian, YH. et al. Joint neighborhood entropy-based gene selection method with fisher score for tumor classification. Appl Intell 49, 1245–1259 (2019). https://doi.org/10.1007/s10489-018-1320-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-018-1320-1

Keywords

Navigation