Abstract
High dimensionality and sample imbalance of gene expression data promote the development of effective algorithms for classifying gene expression data. To improve the ability to distinguish different subtypes of gene expression data, we devise a hypervolume-based discrete evolutionary optimization algorithm (HYBDEOA) in this paper. Four objectives, namely the number of genes, the accuracy, the relevance, and the redundancy, are optimized simultaneously to guide the evolution. Firstly, binary encoding is used to choose some features, projecting data onto different subspaces. After that, a discrete neighborhood operation is conducted to generate a new binary-mapped population. Combining the new population with the current population, we employ the hypervolume-based mechanism to select the Pareto solutions. Finally, a discrete mutation method is proposed to find promising solutions in the binary search space. To demonstrate the performance of HYBDEOA, we apply HYBDEOA to 55 synthetic datasets and 35 cancer gene expression datasets. Extensive experiments are also conducted to reveal the effectiveness and efficiency of HYBDEOA. The experimental results demonstrate that our proposed method is a parameter-less and robust algorithm, which can group gene expression data with a finer and more informative classification.














Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Heller MJ (2002) Dna microarray technology: devices, systems, and applications. Ann Rev Biomed Eng 4(1):129–153
Dağlıyan O, Üney-Yüksektepe F, Kavaklı IH, Türkay M (2011) Optimization based tumor classification from microarray gene expression data. PLoS One 6(2):e14579
Nguyen DV, Rocke DM (2002) Tumor classification by partial least squares using microarray gene expression data. Bioinformatics 18(1):39–50
Marisa L, de Reyniès A, Duval A, Selves J, Gaub MP, Vescovo L, Etienne-Grimaldi M-C, Schiappa R, Guenot D, Ayadi M et al (2013) Gene expression classification of colon cancer into molecular subtypes: characterization, validation, and prognostic value. PLoS Med 10(5):e1001453
Alshamlan HM, Badr GH, Alohali YA (2015) Genetic bee colony (GBC) algorithm: a new gene selection method for microarray cancer classification. Comput Biol Chem 56:49–60
Huijuan L, Chen J, Yan K, Jin Q, Xue Y, Gao Z (2017) A hybrid feature selection algorithm for gene expression data classification. Neurocomputing 256:56–62
Ghaddar B, Naoum-Sawaya J (2018) High dimensional data classification and feature selection using support vector machines. Eur J Oper Res 265(3):993–1004
Schwarz G et al (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Mukhopadhyay A, Mandal M (2014) Identifying non-redundant gene markers from microarray data: a multiobjective variable length PSO-based approach. IEEE/ACM Trans Comput Biol Bioinform TCBB 11(6):1170–1183
Annavarapu CSR, Dara S, Banka H (2016) Cancer microarray data feature selection using multi-objective binary particle swarm optimization algorithm. EXCLI J 15:460
Mohamad MS, Omatu S, Deris S, Misman MF, Yoshioka M (2009) A multi-objective strategy in genetic algorithms for gene selection of gene expression data. Artif Life Robot 13(2):410–413
Chakraborty G, Chakraborty B (2013) Multi-objective optimization using pareto ga for gene-selection from microarray data for disease classification. In: 2013 IEEE international conference on systems, man, and cybernetics. IEEE, pp 2629–2634
Lv J, Peng Q, Chen X, Sun Z (2016) A multi-objective heuristic algorithm for gene expression microarray data classification. Expert Syst Appl 59:13–19
Wang Y, Liu B, Ma Z, Wong K-C, Li X (2019) Nature-inspired multiobjective cancer subtype diagnosis. IEEE J Transl Eng Health Med 7:1–12
Reza Bonyadi Mohammad, Zbigniew Michalewicz, Boukhelifa N, Bezerianos A, Cancino W, Lutton E, Mehrdad Amirghasemi, Reza Zamani, Dymond Antoine S, Schalk Kok et al (2014) Particle swarm optimization for single objective continuous space problems: a review. Evolut Comput 1530:9304
Lambora A, Gupta K, Chopra K (2019) Genetic algorithm-a literature review. In: 2019 international conference on machine learning, big data, cloud and parallel computing (COMITCon). IEEE, pp 380–384
Binitha S, Sathya SS et al (2012) A survey of bio inspired optimization algorithms. Int J Soft Comput Eng 2(2):137–151
Brazma A, Vilo J (2000) Gene expression data analysis. FEBS Lett 480(1):17–24
Li X, Zhang J, Yin M (2014) Animal migration optimization: an optimization algorithm inspired by animal migration behavior. Neural Comput Appl 24(7–8):1867–1877
Xue B, Zhang M, Browne WN (2014) Particle swarm optimisation for feature selection in classification: novel initialisation and updating mechanisms. Appl Soft Comput 18:261–276
Karakaya G, Galelli S, Ahipasaoglu SD, Taormina R (2016) Identifying (quasi) equally informative subsets in feature selection problems for classification: a max-relevance min-redundancy approach. IEEE Trans Cybern 46(6):1424–1437
Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Deng J, Zhang Q (2019) Approximating hypervolume and hypervolume contributions using polar coordinate. IEEE Trans Evolut Comput 23:913–918
Brockhoff D, Zitzler E (2007) Improving hypervolume-based multiobjective evolutionary algorithms by using objective reduction methods. In: 2007 IEEE congress on evolutionary computation. IEEE, pp 2086–2093
Bader J, Zitzler E (2011) Hype: an algorithm for fast hypervolume-based many-objective optimization. Evolut Comput 19(1):45–76
Das S, Suganthan PN (2010) Differential evolution: a survey of the state-of-the-art. IEEE Trans Evolut Comput 15(1):4–31
Chang HY, Nuyten DSA, Sneddon JB, Hastie T, Tibshirani R, Sørlie T, Dai H, He YD, van’t Veer LJ, Bartelink H et al (2005) Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. Proc Natl Acad Sci 102(10):3738–3743
Liu H, Zhao R, Fang H, Cheng F, Yun F, Liu Y-Y (2017) Entropy-based consensus clustering for patient stratification. Bioinformatics 33(17):2691–2698
Li X, Zhang S, Wong K-C (2018) Single-cell rna-seq interpretations using evolutionary multiobjective ensemble pruning. Bioinformatics 10:e1056
Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850
Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3(Dec):583–617
Coello CAC, Pulido GT, Lechuga MS (2004) Handling multiple objectives with particle swarm optimization. IEEE Trans Evolut Comput 8(3):256–279
Sikdar UK, Ekbal A, Saha S (2015) Mode: multiobjective differential evolution for feature selection and classifier ensemble. Soft Comput 19(12):3529–3549
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evolut Comput 6(2):182–197
Laumanns M (2002) SPEA2: improving the strength pareto evolutionary algorithm. Technical report gloriastrasse
Deb K, Jain H (2014) An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part I: solving problems with box constraints. IEEE Trans Evolut Comput 18(4):577–601
Denœux T (2008) A k-nearest neighbor classification rule based on Dempster–Shafer theory. IEEE Trans Syst Man Cybern 25(5):804–813
Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501
Xue B, Zhang M, Browne WN, Yao X (2016) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evolut Comput 20(4):606–626
Moustakidis S, Mallinis G, Koutsias N, Theocharis JB, Petridis V (2011) SVM-based fuzzy decision trees for classification of high spatial resolution remote sensing images. IEEE Trans Geosci Remote Sens 50(1):149–169
Cheeseman PC, Self M, Kelly J, Taylor W, Freeman D, Stutz JC (1988) Bayesian classification. AAAI 88:607–611
Hastie T, Rosset S, Zhu J, Zou H (2009) Multi-class adaboost. Stat Interface 2(3):349–360
Lande R, Barrowdough G (1987) Effective population size, genetic variation, and their use in population. In: Soule M (ed) Viable populations for conservation. Cambridge University Press, Cambridge, p 87
Alander JT (1992) On optimal population size of genetic algorithms. In: CompEuro 1992 Proceedings computer systems and software engineering. IEEE, pp 65–70
Das S, Mullick SS, Suganthan PN (2016) Recent advances in differential evolution-an updated survey. Swarm Evolut Comput 27:1–30
Acknowledgements
This research is supported by the National Natural Science Foundation of China under Grant No. 61603087, funded by the Natural Science Foundation of Jilin Province under Grant No. 20190103006JH, and the Science and Technology Development Planning of Jilin Province No. 20160204043GX. The work described in this paper was substantially supported by two grants from the Research Grants Council of the Hong Kong Special Administrative Region [CityU 11203217] and [CityU 11200218] and the funding from Hong Kong Institute for Data Science (HKIDS) at City University of Hong Kong. The work described in this paper was partially supported by a grant from City University of Hong Kong (CityU 11202219).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Wang, Y., Li, S., Wang, L. et al. Cancer molecular subtype classification from hypervolume-based discrete evolutionary optimization. Neural Comput & Applic 32, 15489–15502 (2020). https://doi.org/10.1007/s00521-020-04846-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-04846-2