Abstract
‘Curse of Dimensionality’—massive generation of high-dimensional medical datasets from various biomedical applications hardens the data analytic process for precise medical diagnosis. The design of an efficient feature selection technique for finding the optimal feature subset can be devised as a prominent solution to the above-said challenge. Further, it also improves the accuracy and minimizes the computational complexity of the learning model. The state-of-the-art feature selection techniques based on heuristic and statistical functions suffer from significant challenges in terms of classification accuracy, time complexity, etc. Hence, this paper presents Rough Set Theory and Hypergraph (RSHGT)-based feature selection technique to identify the optimal feature subset for accurate medical diagnosis. Experimental validations using six medical datasets from the Kent Ridge Biomedical dataset repository prove the efficiency of RSHGT in terms of reduct size, accuracy, precision, recall, and time complexity.
Similar content being viewed by others
References
Abdel-Zaher AM, Eldeib AM (2016) Breast cancer classification using deep belief networks. Expert Syst Appl 46:139–144. https://doi.org/10.1016/j.eswa.2015.10.015
Abdi MJ, Hosseini SM, Rezghi M (2012) A novel weighted support vector machine based on particle swarm optimization for gene selection and tumor classification. Comput Math Methods Med 2012:1–7. https://doi.org/10.1155/2012/320698
Abraham A, Falc R, Bello R (2009) Rough set theory: a true landmark in data analysis. Springer, Berlin
Alba E, Garcia-Nieto J, Jourdan L, Talbi E-G (2007) Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms. In: IEEE congress on evolutionary computation. IEEE, pp 284–290
Berge C (1973) Graphs and hypergraphs. North-Holland Publishing Co., Amsterdam
Bonilla Huerta E, Duval B, Hao J-K (2010) A hybrid LDA and genetic algorithm for gene selection and classification of microarray data. Neurocomputing 73:2375–2383. https://doi.org/10.1016/j.neucom.2010.03.024
Bostani H, Sheikhan M (2017) Hybrid of binary gravitational search algorithm and mutual information for feature selection in intrusion detection systems. Soft Comput 21:2307–2324. https://doi.org/10.1007/s00500-015-1942-8
Chen Y, Zhu Q, Xu H (2015) Finding rough set reducts with fish swarm algorithm. Knowl Based Syst 81:22–29. https://doi.org/10.1016/j.knosys.2015.02.002
Cheruku R, Edla DR, Kuppili V, Dharavath R (2017) RST-BatMiner: a fuzzy rule miner integrating rough set feature selection and Bat optimization for detection of diabetes disease. Appl Soft Comput 67:764. https://doi.org/10.1016/j.asoc.2017.06.032
Cong Y, Wang S, Fan B et al (2016) UDSFS: unsupervised deep sparse feature selection. Neurocomputing 196:150–158. https://doi.org/10.1016/j.neucom.2015.10.130
Dharmarajan R, Kannan K (2013) On minimal transversals in simple hypergraphs. Int J Comput Appl Math 7:117–123
Eiter T, Gottlob G (1995) Identifying the minimal transversals of a hypergraph and related problems. SIAM J Comput 24:1278–1304
El Akadi A, Amine A, El Ouardighi A, Aboutajdine D (2011) A two-stage gene selection scheme utilizing MRMR filter and GA wrapper. Knowl Inf Syst 26:487–500. https://doi.org/10.1007/s10115-010-0288-x
Gauthama Raman MR, Kirthivasan K, Shankar Sriram VS (2017a) Development of rough set–hypergraph technique for key feature identification in intrusion detection systems. Comput Electr Eng 59:189–200. https://doi.org/10.1016/j.compeleceng.2017.01.006
Gauthama Raman MR, Somu N, Kirthivasan K et al (2017b) An efficient intrusion detection system based on hypergraph-genetic algorithm for parameter optimization and feature selection in support vector machine. Knowl Based Syst. https://doi.org/10.1016/j.knosys.2017.07.005
Hu, Xiaohua, Nick Cercone JH, Hu X, Cercone N, Han J (1994) An attribute-oriented rough set approach for knowledge discovery in databases. In: Ziarko WP (ed) Rough sets, fuzzy sets and knowledge discovery. Springer, London, pp 90–99
Hu K, Diao L, Lu Y, Shi C (2000) A heuristic optimal reduct algorithm. In: International conference on intelligent data engineering and automated learning: data mining, financial engineering, and intelligent agents, pp 89–99
Hu K, Lu Y, Shi C (2003) Feature ranking in rough sets. AI Commun 16:41–50
Huerta E, Duval B, Hao J (2008) Gene selection for microarray data by a LDA-based genetic algorithm. In: IAPR international conference on pattern recognition in bioinformatics. Springer, Berlin, Heidelberg, pp 250–261
Inbarani H, Azar A, Jothi G (2014) Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis. Comput methods programs 113:175–185
Inbarani H, Bagyamathi M, Azar A (2015a) A novel hybrid feature selection method based on rough set and improved harmony search. Neural Comput Appl 26(8):1859–1880
Inbarani HH, Bagyamathi M, Azar AT (2015b) A novel hybrid feature selection method based on rough set and improved harmony search. Neural Comput Appl 26:1859–1880. https://doi.org/10.1007/s00521-015-1840-0
Jiang F, Sui Y, Zhou L (2015) A relative decision entropy-based feature selection approach. Pattern Recognit 48:2151–2163. https://doi.org/10.1016/j.patcog.2015.01.023
Kannan K, Kanna BR, Aravindan C (2010) Root Mean Square filter for noisy images based on hyper graph model. Image Vis Comput 28:1329–1338. https://doi.org/10.1016/j.imavis.2010.01.013
Kavvadias D, Stavropoulos E (2005) An efficient algorithm for the transversal hypergraph generation. J Graph Algorithms Appl 9:239–264
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324. https://doi.org/10.1016/S0004-3702(97)00043-X
Li S, Wu X, Tan M (2008) Gene selection using hybrid particle swarm optimization and genetic algorithm. Soft Comput 12:1039–1048. https://doi.org/10.1007/s00500-007-0272-x
Lu H, Chen J, Yan K et al (2017) A hybrid feature selection algorithm for gene expression data classification. Neurocomputing 256:56–62. https://doi.org/10.1016/j.neucom.2016.07.080
Medjahed SA, Saadi TA, Benyettou A, Ouali M (2017) Kernel-based learning and feature selection analysis for cancer diagnosis. Appl Soft Comput 51:39–48. https://doi.org/10.1016/j.asoc.2016.12.010
Moteghaed NY, Maghooli K, Pirhadi S, Garshasbi M (2015) Biomarker discovery based on hybrid optimization algorithm and artificial neural networks on microarray data for cancer classification. J Med Signals Sens 5:88–96
Øhrn A, Komorowski J (1997) Rosetta–a rough set toolkit for analysis of data. In: Third international joint conference on information sciences, pp 403–407
Pashaei E, Aydin N (2017) Binary black hole algorithm for feature selection and classification on biological data. Appl Soft Comput 56:94–106. https://doi.org/10.1016/j.asoc.2017.03.002
Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11:341–356
Pawlak Z (1998) Rough set theory and its applications to data analysis. Cybern Syst 29:661–688
Pölsterl S, Conjeti S, Navab N, Katouzian A (2016) Survival analysis for high-dimensional, heterogeneous medical data: exploring feature extraction as an alternative to feature selection. Artif Intell Med 72:1–11. https://doi.org/10.1016/j.artmed.2016.07.004
Raman MRG, Kannan K, Pal SK, Shankar Sriram VS (2016) Rough set-hypergraph-based feature selection approach for intrusion detection systems. Def Sci J 66:612–617. https://doi.org/10.14429/dsj.66.10802
Raman MRG, Somu N, Kirthivasan K, Sriram VSS (2017) A hypergraph and arithmetic residue-based probabilistic neural network for classification in intrusion detection systems. Neural Netw 92:89–97. https://doi.org/10.1016/j.neunet.2017.01.012
Sahu B, Mishra D (2012) A novel feature selection algorithm using particle swarm optimization for cancer microarray data. Procedia Eng 38:27–31. https://doi.org/10.1016/j.proeng.2012.06.005
Sánchez-Maroño N, Alonso-Betanzos A (2007) Filter methods for feature selection–a comparative study. In: International conference on intelligent data engineering and automated learning. Springer, Berlin, Heidelberg, pp 178–187
Sohrabi MK, Tajik A (2017) Multi-objective feature selection for warfarin dose prediction. Comput Biol Chem 69:126–133. https://doi.org/10.1016/j.compbiolchem.2017.06.002
Somu N, Raman MRG, Kirthivasan K, Sriram VSS (2016) Hypergraph based feature selection technique for medical diagnosis. J Med Syst 40:239. https://doi.org/10.1007/s10916-016-0600-8
Somu N, Kirthivasan K, Shankar Sriram VS (2017) A rough set-based hypergraph trust measure parameter selection technique for cloud service selection. J Supercomput. https://doi.org/10.1007/s11227-017-2032-8
Somu N, Gauthama Raman MR, Kalpana V, Krithivasan K, Shankar Sriram VS (2018) An improved robust heteroscedastic probabilistic neural network based trust prediction approach for cloud service selection. Neural Networks 108:339–354. https://doi.org/10.1016/j.neunet.2018.08.005
Somu N, Gauthama Raman MR, Obulaporam G, Krithivasan K, Shankar Sriram VS (2019) An improved rough set approach for optimal trust measure parameter selection in cloud environments. Soft Comput. https://doi.org/10.1007/s00500-018-03753-y
Wang X, Gotoh O (2009) Microarray-based cancer prediction using soft computing approach. 7:123–139
Wang G, Yu H, Yang D (2002) Decision table reduction based on conditional information entropy. Chinese J Comput Ed 25:759–766
Wang X, Yang J, Teng X, Weijun Xia RJ (2007) Feature selection based on rough sets and particle swarm optimization. Pattern Recognit Lett 28:459–471. https://doi.org/10.1016/j.patrec.2006.09.003
Witten I, Frank E, Hall M, Pal C (2016) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann
Wroblewski J (1995) Finding minimal reducts using genetic algorithms. In: Proccedings of the second annual join conference on infromation science, pp 186–189
Zhu Z, Ong Y-S, Dash M (2007) Markov blanket-embedded genetic algorithm for gene selection. Pattern Recognit 40:3236–3248. https://doi.org/10.1016/j.patcog.2007.02.007
Funding
This work was supported by The Department of Science and Technology – India, and TATA Realty – SASTRA Srinivasa Ramanujan Research Cell (Grant No: SR/FST/MSI-107/2015, MRT/2017/000155, and SR/FST/ETI-349/2013).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All the authors declare that they do not have any conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by V. Loia.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gauthama Raman, M.R., Nivethitha, S., Kannan, K. et al. A hybrid approach using rough set theory and hypergraph for feature selection on high-dimensional medical datasets. Soft Comput 23, 12655–12672 (2019). https://doi.org/10.1007/s00500-019-03818-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-019-03818-6