Skip to main content
Log in

Feature selection and classification of leukocytes using random forest

  • Original Article
  • Published:
Medical & Biological Engineering & Computing Aims and scope Submit manuscript

Abstract

In automatic segmentation of leukocytes from the complex morphological background of tissue section images, a vast number of artifacts/noise are also extracted causing large amount of multivariate data generation. This multivariate data degrades the performance of a classifier to discriminate between leukocytes and artifacts/noise. However, the selection of prominent features plays an important role in reducing the computational complexity and increasing the performance of the classifier as compared to a high-dimensional features space. Therefore, this paper introduces a novel Gini importance-based binary random forest feature selection method. Moreover, the random forest classifier is used to classify the extracted objects into artifacts, mononuclear cells, and polymorphonuclear cells. The experimental results establish that the proposed method effectively eliminates the irrelevant features, maintaining the high classification accuracy as compared to other feature reduction methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96:6745–6750

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  2. Bache K, Lichman M (2013) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine. http://archive.ics.uci.edu/ml

  3. Bhattacharyya S, Sengupta A, Chakraborti T, Konar A, Tibarewala D (2014) Automatic feature selection of motor imagery EEG signals using differential evolution and learning automata. Med Biol Eng Comput 52:131–139

    Article  PubMed  Google Scholar 

  4. Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  Google Scholar 

  5. Cotter SF, Kreutz-Delgado K, Rao BD (2001) Backward sequential elimination for sparse vector subset selection. Signal Process 81:1849–1864

    Article  Google Scholar 

  6. Croarkin C, Tobias P (2012) NIST/SEMATECH e-handbook of statistical methods. [Online]. http://www.itl.nist.gov/div898/

  7. Deng H, Runger GC (2012) Feature selection via regularized trees, CoRR, vol. abs/1201.1587

  8. Dias N, Kamrunnahar M, Mendes P, Schiff S, Correia J (2010) Feature selection on movement imagery discrimination and attention detection. Med Biol Eng Comput 48:331–341

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  9. Diaz-Uriarte R, Alvarez de Andres S (2006) Gene selection and classification of microarray data using random forest. BMC Bioinform 7:3

    Article  Google Scholar 

  10. Diaz-Uriarte R, Alvarez de Andres S (2005) Variable selection from random forests: application to gene expression data, Technical Report, [Online]. http://arxiv.org/abs/q-bio.QM/0503025

  11. Fernandez Caballero JC, Martinez FJ, Hervas C, Gutierrez PA (2010) Sensitivity versus accuracy in multiclass problems using memetic pareto evolutionary neural networks. IEEE Trans Neural Netw 21:750–770

    Article  PubMed  Google Scholar 

  12. Geurts P, Fillet M, De Seny D, Meuwis M-A, Malaise M, Merville M-P, Wehenkel L (2005) Proteomic mass spectra classification using decision tree based ensemble methods. Bioinformatics 21:3138–3145

    Article  CAS  PubMed  Google Scholar 

  13. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537

    Article  CAS  PubMed  Google Scholar 

  14. Gonzalez RC, Woods RE (2009) Digital image processing. Pearson Education, India

  15. Gurcan MN, Boucheron LE, Can A, Madabhushi A, Rajpoot NM, Yener B (2009) Histopathological image analysis: a review. IEEE Rev Biomed Eng 2:147–171

    Article  PubMed Central  PubMed  Google Scholar 

  16. Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422

    Article  Google Scholar 

  17. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    Google Scholar 

  18. Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the international conference on machine learning

  19. Jiang H, Deng Y, Chen H-S, Tao L, Sha Q, Chen J, Tsai C-J, Zhang S (2004) Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes. BMC Bioinform 5:81

    Article  Google Scholar 

  20. Kirchner M, Timm W, Fong P, Wangemann P, Steen H (2010) Non-linear classification for on-the-fly fractional mass filtering and targeted precursor fragmentation in mass spectrometry experiments. Bioinformatics 26:791–797

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  21. Klassen M (2010) Learning microarray cancer datasets by random forests and support vector machines. In: Proceedings of the IEEE international conference on future information technology

  22. Ko BC, Gim J, Nam J (2011) Automatic white blood cell segmentation using stepwise merging rules and gradient vector flow snake. Micron 42:695–705

    Article  PubMed  Google Scholar 

  23. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324

    Article  Google Scholar 

  24. Kumar V, Abbas AK, Fausto N, Aster J (2010) Robbins and cotran pathologic basis of disease. Saunders Elsevier, Philadelphia

    Google Scholar 

  25. Kuse M, Sharma T, Gupta S (2010) A classification scheme for lymphocyte segmentation in H&E stained histology images, ser. Lecture notes in computer science, vol 6388 LNCS

  26. Lee JW, Lee JB, Park M, Song SH (2005) An extensive comparison of recent classification tools applied to microarray data. Comput Stat Data Anal 48:869–885

    Article  Google Scholar 

  27. Liu H, Peng P, Hsieh T, Yeh T, Lin C, Chen C, Hou J, Shih L, Liang D (2013) Comparison of feature selection methods for cross-laboratory microarray analysis. IEEE/ACM Trans Comput Biol Bioinform 10(3):593–604

  28. Lomash V, Parihar SK, Jain NK, Katiyar AK (2010) Effect of solanum nigrum and ricinus communis extracts on histamine and carrageenan-induced inflammation in the chicken skin. Cell Mol Biol 56:OL1239–OL1251

    CAS  PubMed  Google Scholar 

  29. Lomash V, Jadhav SE, Ahmed F, Vijayaraghavan R, Pant SC (2011) Evaluation of wound-healing formulation against sulphur mustard-induced skin injury in mice. Hum Exp Toxicol 31:588–605

    Article  PubMed  Google Scholar 

  30. Lomash V, Pant SC (2014) A novel decontaminant and wound healant formulation of N, N′-dichloro-bis [2, 4, 6-trichlorophenyl] urea against sulfur mustard induced skin injury. Wound Repair Regen 22:85–95

    Article  PubMed  Google Scholar 

  31. Menze BH, Petrich W, Hamprecht FA (2007) Multivariate feature selection and hierarchical classification for infrared spectroscopy: serum-based detection of bovine spongiform encephalopathy. Anal Bioanal Chem 387:1801–1807

    Article  CAS  PubMed  Google Scholar 

  32. Menze BH, Kelm BM, Masuch R, Himmelreich U, Bachert P, Petrich W, Hamprecht FA (2009) A comparison of random forest and its gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinform 10:213

    Article  Google Scholar 

  33. Mohapatra S, Patra D, Satpathy S (2011) Automated leukemia detection in blood microscopic images using statistical texture analysis. In: Proceedings of the international conference on communication computing security

  34. Osowski S, Siroic R, Markiewicz T, Siwek K (2009) Application of support vector machine and genetic algorithm for improved blood cell recognition. IEEE Trans Instrum Meas 58:2159–2168

    Article  Google Scholar 

  35. Phukpattaranont P, Boonyaphiphat P (2006) Segmentation of cancer cells in microscopic images using neural network and mathematical morphology. In: Proceedings of international joint conference SICE-ICASE.

  36. Reif DM, Motsinger AA, McKinney BA, Crowe JE (2006) Feature selection using a random forests classifier for the integrated analysis of multiple data types. In: Proceedings of the international conference on computational intelligence and bioinformatics and computational biology

  37. Robin G, Jean-Michel P, Christine T-M (2010) Variable selection using random forests. Pattern Recognit Lett 31:2225–2236

    Article  Google Scholar 

  38. Saraswat M, Arya KV, Sharma H (2013) Leukocyte segmentation in tissue images using differential evolution algorithm. Swarm Evol Comput 11:46–54

    Article  Google Scholar 

  39. Saraswat M, Arya KV (2013) Colour normalisation of histopathological images. Comput Methods Biomech Biomed Eng Imaging Vis 1:185–197

    Google Scholar 

  40. Saraswat M, Arya KV (2014) Automated microscopic image analysis for leukocytes identification: a survey. Micron 65:20–33

    Article  PubMed  Google Scholar 

  41. Saraswat M, Arya KV (2014) Supervised leukocyte segmentation in tissue images using multi-objective optimization technique. Eng Appl Artif Intell 31:44–52

    Article  Google Scholar 

  42. Shen K-Q, Ong C-J, Li X-P, Hui Z, Wilder-Smith E (2007) A feature selection method for multilevel mental fatigue EEG classification. IEEE Trans Biomed Eng 54:1231–1237

    Article  PubMed  Google Scholar 

  43. Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico AV, Richie JP et al (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1:203–209

    Article  CAS  PubMed  Google Scholar 

  44. Statnikov A, Wang L, Aliferis CF (2008) A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC Bioinform 9:319

    Article  Google Scholar 

  45. Storn R, Price K (1997) Differential evolution a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11:341–359

    Article  Google Scholar 

  46. Theera-Umpon N, Dhompongsa S (2007) Morphological granulometric features of nucleus in automatic bone marrow white blood cell classification. IEEE Trans Inf Technol Biomed 11:353–359

    Article  PubMed  Google Scholar 

  47. Tuceryan M, Jain AK (1998) The handbook of pattern recognition and computer vision. World Scientific Publishing Co., ch. Texture analysis

  48. Wu B, Abbott T, Fishman D, McMurray W, Mor G, Stone K, Ward D, Williams K, Zhao H (2003) Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data. Bioinformatics 19:1636–1643

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgments

Authors are thankful to Defence Research and Development Establishment (DRDE), Gwalior, India, for funding a part of this work under the project (DRDE-P1-2011/Task-190). Authors are also thankful to Dr. S. C. Pant, Scientist ‘F’ at D.R.D.E., Gwalior, and Dr. Vinay Lomash, Ph.D., MVSc (Pathology) for tendering their valuable help in the analysis of microscopic images.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mukesh Saraswat.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saraswat, M., Arya, K.V. Feature selection and classification of leukocytes using random forest. Med Biol Eng Comput 52, 1041–1052 (2014). https://doi.org/10.1007/s11517-014-1200-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11517-014-1200-8

Keywords

Navigation