An entropy-based classification of breast cancerous genes using microarray data

Mondal, Mausami; Semwal, Rahul; Raj, Utkarsh; Aier, Imlimaong; Varadwaj, Pritish Kumar

doi:10.1007/s00521-018-3864-8

An entropy-based classification of breast cancerous genes using microarray data

Original Article
Published: 10 November 2018

Volume 32, pages 2397–2404, (2020)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Mausami Mondal¹^na1,
Rahul Semwal¹^na1,
Utkarsh Raj¹^na1,
Imlimaong Aier¹^na1 &
…
Pritish Kumar Varadwaj¹

443 Accesses
5 Citations
Explore all metrics

Abstract

Gene expression levels obtained from microarray data provide a promising technique for doing classification on cancerous data. Due to the high dimensionality of the microarray datasets, the redundant genes need to be removed and only significant genes are required for building the classifier. In this work, an entropy-based method was used based on supervised learning to differentiate between normal tissue and breast tumor based on their gene expression profiles. This work employs four widely used machine learning techniques for breast cancer prediction, namely support vector machine (SVM), random forest, k-nearest neighbor (KNN) and naive Bayes. The performance of these techniques was evaluated on four different classification performance measurements which result in getting more accuracy in case of SVM as compared to other machine learning algorithms. Classification accuracy of 91.5% was achieved by support vector machine with 0.833 F1 measures. Furthermore, these techniques were evaluated on the basis of performance by ROC curve and calibration graph.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classifying Microarray Gene Expression Cancer Data Using Statistical Feature Selection and Machine Learning Methods

Multi-class Classification for Breast Cancer with High Dimensional Microarray Data Using Machine Learning Classifier

Analysis of Classification Methods for Gene Expression Data

References

Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537
Article Google Scholar
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422
Article Google Scholar
Ben-Dor A, Bruhn L, Friedman N et al (2000) Tissue classification with gene expression profiles. J Comput Biol 7:559–583
Article Google Scholar
DeSantis CE, Siegel RL, Sauer AG et al (2016) Cancer statistics for African Americans, 2016: progress and opportunities in reducing racial disparities. CA Cancer J Clin 66:290–308
Article Google Scholar
Hedley DW, Rugg CA, Gelber RD (1987) Association of DNA index and S-phase fraction with prognosis of nodes positive early breast cancer. Cancer Res 47:4729–4735
Google Scholar
Khan J, Wei JS, Ringner M et al (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7:673–679
Article Google Scholar
Luo J, Ellis MJ (2010) Microarray data analysis in neoadjuvant biomarker studies in estrogen receptor-positive breast cancer. Breast Cancer Res 12:112. https://doi.org/10.1186/bcr2616
Article Google Scholar
Schena M, Shalon D, Davis RW, Brown PO (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270:467–470
Article Google Scholar
DeRisi JL, Iyer VR, Brown PO (1997) Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278:680–686
Article Google Scholar
Wang L, Chu F, Xie W (2007) Accurate cancer classification using expressions of very few genes. IEEEACM Trans Comput Biol Bioinforma TCBB 4:40–53
Article Google Scholar
Furberg CD, Yusuf S (1988) Effect of drug therapy on survival in chronic congestive heart failure. Am J Cardiol 62:41A–45A
Article Google Scholar
Heuvers ME, Hegmans JP, Stricker BH, Aerts JG (2012) Improving lung cancer survival; time to move on. BMC Pulm Med 12:77. https://doi.org/10.1186/1471-2466-12-77
Article Google Scholar
Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2012) An ensemble of filters and classifiers for microarray data classification. Pattern Recognit 45:531–539
Article Google Scholar
Herrero J, Valencia A, Dopazo J (2001) A hierarchical unsupervised growing neural network for clustering gene expression patterns. Bioinformatics 17:126–136
Article Google Scholar
Dembele D, Kastner P (2003) Fuzzy C-means method for clustering microarray data. Bioinformatics 19:973–980
Article Google Scholar
Saldanha AJ (2004) Java Treeview—extensible visualization of microarray data. Bioinformatics 20:3246–3248
Article Google Scholar
Vanitha CDA, Devaraj D, Venkatesulu M (2015) Gene expression data classification using support vector machine and mutual information-based gene selection. Proced Comput Sci 47:13–21
Article Google Scholar
Yeung KY, Haynor DR, Ruzzo WL (2001) Validating clustering for gene expression data. Bioinformatics 17:309–318
Article Google Scholar
Chang JC, Wooten EC, Tsimelzon A et al (2003) Gene expression profiling for the prediction of therapeutic response to docetaxel in patients with breast cancer. Lancet 362:362–369
Article Google Scholar
Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit 30:1145–1159
Article Google Scholar
Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. Mach Learn ECML 98:137–142
Article Google Scholar
Furey TS, Cristianini N, Duffy N et al (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16:906–914
Article Google Scholar
Anderson TF, Abrams DS, Grens EA (1978) Evaluation of parameters for nonlinear thermodynamic models. AIChE J 24:20–29
Article MathSciNet Google Scholar
Serretti A, Smeraldi E (2004) Neural network analysis in pharmacogenetics of mood disorders. BMC Med Genet 5:27
Article Google Scholar
Ng AY, Jordan MI (2002) On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. In: Advances in neural information processing systems. pp 841–848
Ahmed M, Shahjaman M, Rana M et al (2017) Robustification of Naïve bayes classifier and its application for microarray gene expression data analysis. Biomed Res Int 2017:3020627. https://doi.org/10.1155/2017/3020627
Article Google Scholar
Lu Y, Han J (2003) Cancer classification using gene expression data. Inf Syst 28:243–268
Article Google Scholar
Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2:18–22
Google Scholar
Svetnik V, Liaw A, Tong C et al (2003) Random forest: a classification and regression tool for compound classification and QSAR modeling. J Chem Inf Comput Sci 43:1947–1958
Article Google Scholar
Díaz-Uriarte R, De Andres SA (2006) Gene selection and classification of microarray data using random forest. BMC Bioinform 7:3. https://doi.org/10.1186/1471-2105-7-3
Article Google Scholar
Ray C (2011) Cancer identification and gene classification using DNA micro array gene expression patterns. Int J Comput Sci Issues 8:155–160
Google Scholar
Zhang M-L, Zhou Z-H (2007) ML-KNN: a lazy learning approach to multi-label learning. Pattern Recognit 40:2038–2048
Article Google Scholar
Parry RM, Jones W, Stokes TH et al (2010) k-Nearest neighbor models for microarray gene expression analysis and clinical outcome prediction. Pharmacogenomics J 10:292–309
Article Google Scholar
Geisser S (1993) Selecting a statistical model and predicting. In: Predictive inference: an introduction. Springer, Berlin, pp 88–117
Demšar J, Curk T, Erjavec A et al (2013) Orange: data mining toolbox in Python. J Mach Learn Res 14:2349–2353
MATH Google Scholar

Download references

Acknowledgements

The authors acknowledge the Department of Bioinformatics and Applied Sciences, Indian Institute of Information Technology, Allahabad, for providing computing facility.

Author information

Mausami Mondal, Rahul Semwal, Utkarsh Raj, and Imlimaong Aier have contributed equally to this work.

Authors and Affiliations

Department of Bioinformatics and Applied Sciences, Indian Institute of Information Technology - Allahabad, Prayagraj, 211015, India
Mausami Mondal, Rahul Semwal, Utkarsh Raj, Imlimaong Aier & Pritish Kumar Varadwaj

Authors

Mausami Mondal
View author publications
You can also search for this author in PubMed Google Scholar
Rahul Semwal
View author publications
You can also search for this author in PubMed Google Scholar
Utkarsh Raj
View author publications
You can also search for this author in PubMed Google Scholar
Imlimaong Aier
View author publications
You can also search for this author in PubMed Google Scholar
Pritish Kumar Varadwaj
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pritish Kumar Varadwaj.

Ethics declarations

Conflict of interest

The authors have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mondal, M., Semwal, R., Raj, U. et al. An entropy-based classification of breast cancerous genes using microarray data. Neural Comput & Applic 32, 2397–2404 (2020). https://doi.org/10.1007/s00521-018-3864-8

Download citation

Received: 21 July 2017
Accepted: 31 October 2018
Published: 10 November 2018
Issue Date: April 2020
DOI: https://doi.org/10.1007/s00521-018-3864-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An entropy-based classification of breast cancerous genes using microarray data

Abstract

Access this article

Similar content being viewed by others

Classifying Microarray Gene Expression Cancer Data Using Statistical Feature Selection and Machine Learning Methods

Multi-class Classification for Breast Cancer with High Dimensional Microarray Data Using Machine Learning Classifier

Analysis of Classification Methods for Gene Expression Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An entropy-based classification of breast cancerous genes using microarray data

Abstract

Access this article

Similar content being viewed by others

Classifying Microarray Gene Expression Cancer Data Using Statistical Feature Selection and Machine Learning Methods

Multi-class Classification for Breast Cancer with High Dimensional Microarray Data Using Machine Learning Classifier

Analysis of Classification Methods for Gene Expression Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation