Abstract
In automatic segmentation of leukocytes from the complex morphological background of tissue section images, a vast number of artifacts/noise are also extracted causing large amount of multivariate data generation. This multivariate data degrades the performance of a classifier to discriminate between leukocytes and artifacts/noise. However, the selection of prominent features plays an important role in reducing the computational complexity and increasing the performance of the classifier as compared to a high-dimensional features space. Therefore, this paper introduces a novel Gini importance-based binary random forest feature selection method. Moreover, the random forest classifier is used to classify the extracted objects into artifacts, mononuclear cells, and polymorphonuclear cells. The experimental results establish that the proposed method effectively eliminates the irrelevant features, maintaining the high classification accuracy as compared to other feature reduction methods.
Similar content being viewed by others
References
Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96:6745–6750
Bache K, Lichman M (2013) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine. http://archive.ics.uci.edu/ml
Bhattacharyya S, Sengupta A, Chakraborti T, Konar A, Tibarewala D (2014) Automatic feature selection of motor imagery EEG signals using differential evolution and learning automata. Med Biol Eng Comput 52:131–139
Breiman L (2001) Random forests. Mach Learn 45:5–32
Cotter SF, Kreutz-Delgado K, Rao BD (2001) Backward sequential elimination for sparse vector subset selection. Signal Process 81:1849–1864
Croarkin C, Tobias P (2012) NIST/SEMATECH e-handbook of statistical methods. [Online]. http://www.itl.nist.gov/div898/
Deng H, Runger GC (2012) Feature selection via regularized trees, CoRR, vol. abs/1201.1587
Dias N, Kamrunnahar M, Mendes P, Schiff S, Correia J (2010) Feature selection on movement imagery discrimination and attention detection. Med Biol Eng Comput 48:331–341
Diaz-Uriarte R, Alvarez de Andres S (2006) Gene selection and classification of microarray data using random forest. BMC Bioinform 7:3
Diaz-Uriarte R, Alvarez de Andres S (2005) Variable selection from random forests: application to gene expression data, Technical Report, [Online]. http://arxiv.org/abs/q-bio.QM/0503025
Fernandez Caballero JC, Martinez FJ, Hervas C, Gutierrez PA (2010) Sensitivity versus accuracy in multiclass problems using memetic pareto evolutionary neural networks. IEEE Trans Neural Netw 21:750–770
Geurts P, Fillet M, De Seny D, Meuwis M-A, Malaise M, Merville M-P, Wehenkel L (2005) Proteomic mass spectra classification using decision tree based ensemble methods. Bioinformatics 21:3138–3145
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537
Gonzalez RC, Woods RE (2009) Digital image processing. Pearson Education, India
Gurcan MN, Boucheron LE, Can A, Madabhushi A, Rajpoot NM, Yener B (2009) Histopathological image analysis: a review. IEEE Rev Biomed Eng 2:147–171
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the international conference on machine learning
Jiang H, Deng Y, Chen H-S, Tao L, Sha Q, Chen J, Tsai C-J, Zhang S (2004) Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes. BMC Bioinform 5:81
Kirchner M, Timm W, Fong P, Wangemann P, Steen H (2010) Non-linear classification for on-the-fly fractional mass filtering and targeted precursor fragmentation in mass spectrometry experiments. Bioinformatics 26:791–797
Klassen M (2010) Learning microarray cancer datasets by random forests and support vector machines. In: Proceedings of the IEEE international conference on future information technology
Ko BC, Gim J, Nam J (2011) Automatic white blood cell segmentation using stepwise merging rules and gradient vector flow snake. Micron 42:695–705
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324
Kumar V, Abbas AK, Fausto N, Aster J (2010) Robbins and cotran pathologic basis of disease. Saunders Elsevier, Philadelphia
Kuse M, Sharma T, Gupta S (2010) A classification scheme for lymphocyte segmentation in H&E stained histology images, ser. Lecture notes in computer science, vol 6388 LNCS
Lee JW, Lee JB, Park M, Song SH (2005) An extensive comparison of recent classification tools applied to microarray data. Comput Stat Data Anal 48:869–885
Liu H, Peng P, Hsieh T, Yeh T, Lin C, Chen C, Hou J, Shih L, Liang D (2013) Comparison of feature selection methods for cross-laboratory microarray analysis. IEEE/ACM Trans Comput Biol Bioinform 10(3):593–604
Lomash V, Parihar SK, Jain NK, Katiyar AK (2010) Effect of solanum nigrum and ricinus communis extracts on histamine and carrageenan-induced inflammation in the chicken skin. Cell Mol Biol 56:OL1239–OL1251
Lomash V, Jadhav SE, Ahmed F, Vijayaraghavan R, Pant SC (2011) Evaluation of wound-healing formulation against sulphur mustard-induced skin injury in mice. Hum Exp Toxicol 31:588–605
Lomash V, Pant SC (2014) A novel decontaminant and wound healant formulation of N, N′-dichloro-bis [2, 4, 6-trichlorophenyl] urea against sulfur mustard induced skin injury. Wound Repair Regen 22:85–95
Menze BH, Petrich W, Hamprecht FA (2007) Multivariate feature selection and hierarchical classification for infrared spectroscopy: serum-based detection of bovine spongiform encephalopathy. Anal Bioanal Chem 387:1801–1807
Menze BH, Kelm BM, Masuch R, Himmelreich U, Bachert P, Petrich W, Hamprecht FA (2009) A comparison of random forest and its gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinform 10:213
Mohapatra S, Patra D, Satpathy S (2011) Automated leukemia detection in blood microscopic images using statistical texture analysis. In: Proceedings of the international conference on communication computing security
Osowski S, Siroic R, Markiewicz T, Siwek K (2009) Application of support vector machine and genetic algorithm for improved blood cell recognition. IEEE Trans Instrum Meas 58:2159–2168
Phukpattaranont P, Boonyaphiphat P (2006) Segmentation of cancer cells in microscopic images using neural network and mathematical morphology. In: Proceedings of international joint conference SICE-ICASE.
Reif DM, Motsinger AA, McKinney BA, Crowe JE (2006) Feature selection using a random forests classifier for the integrated analysis of multiple data types. In: Proceedings of the international conference on computational intelligence and bioinformatics and computational biology
Robin G, Jean-Michel P, Christine T-M (2010) Variable selection using random forests. Pattern Recognit Lett 31:2225–2236
Saraswat M, Arya KV, Sharma H (2013) Leukocyte segmentation in tissue images using differential evolution algorithm. Swarm Evol Comput 11:46–54
Saraswat M, Arya KV (2013) Colour normalisation of histopathological images. Comput Methods Biomech Biomed Eng Imaging Vis 1:185–197
Saraswat M, Arya KV (2014) Automated microscopic image analysis for leukocytes identification: a survey. Micron 65:20–33
Saraswat M, Arya KV (2014) Supervised leukocyte segmentation in tissue images using multi-objective optimization technique. Eng Appl Artif Intell 31:44–52
Shen K-Q, Ong C-J, Li X-P, Hui Z, Wilder-Smith E (2007) A feature selection method for multilevel mental fatigue EEG classification. IEEE Trans Biomed Eng 54:1231–1237
Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico AV, Richie JP et al (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1:203–209
Statnikov A, Wang L, Aliferis CF (2008) A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC Bioinform 9:319
Storn R, Price K (1997) Differential evolution a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11:341–359
Theera-Umpon N, Dhompongsa S (2007) Morphological granulometric features of nucleus in automatic bone marrow white blood cell classification. IEEE Trans Inf Technol Biomed 11:353–359
Tuceryan M, Jain AK (1998) The handbook of pattern recognition and computer vision. World Scientific Publishing Co., ch. Texture analysis
Wu B, Abbott T, Fishman D, McMurray W, Mor G, Stone K, Ward D, Williams K, Zhao H (2003) Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data. Bioinformatics 19:1636–1643
Acknowledgments
Authors are thankful to Defence Research and Development Establishment (DRDE), Gwalior, India, for funding a part of this work under the project (DRDE-P1-2011/Task-190). Authors are also thankful to Dr. S. C. Pant, Scientist ‘F’ at D.R.D.E., Gwalior, and Dr. Vinay Lomash, Ph.D., MVSc (Pathology) for tendering their valuable help in the analysis of microscopic images.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Saraswat, M., Arya, K.V. Feature selection and classification of leukocytes using random forest. Med Biol Eng Comput 52, 1041–1052 (2014). https://doi.org/10.1007/s11517-014-1200-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11517-014-1200-8