Skip to main content
Log in

Classification of breast masses in mammograms using genetic programming and feature selection

  • Original Article
  • Published:
Medical and Biological Engineering and Computing Aims and scope Submit manuscript

Abstract

Mammography is a widely used screening tool and is the gold standard for the early detection of breast cancer. The classification of breast masses into the benign and malignant categories is an important problem in the area of computer-aided diagnosis of breast cancer. A small dataset of 57 breast mass images, each with 22 features computed, was used in this investigation; the same dataset has been previously used in other studies. The extracted features relate to edge-sharpness, shape, and texture. The novelty of this paper is the adaptation and application of the classification technique called genetic programming (GP), which possesses feature selection implicitly. To refine the pool of features available to the GP classifier, we used feature-selection methods, including the introduction of three statistical measures—Student’s t test, Kolmogorov–Smirnov test, and Kullback–Leibler divergence. Both the training and test accuracies obtained were high: above 99.5% for training and typically above 98% for test experiments. A leave-one-out experiment showed 97.3% success in the classification of benign masses and 95.0% success in the classification of malignant tumors. A shape feature known as fractional concavity was found to be the most important among those tested, since it was automatically selected by the GP classifier in almost every experiment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Page title: Breast Cancer Statistics (2005) Source: UK National Statistics website http://www.statistics.gov.uk/

  2. Yaffe MJ (2001) Digital mammography: IWDM 2000, Madison. Medical Physics Publishing, WI

    Google Scholar 

  3. Peitgen H–O (2003) Digital mammography: IWDM 2002. Springer, Bremen

    Google Scholar 

  4. Rangayyan RM, Ayres FJ, Desautels JEL (2005) Computer-aided diagnosis of breast cancer: toward the detection of early and subtle signs, the 1st world experts’ congress on women’s health medicine and healthcare. World Academy of Biomedical Technologies, Paris

    Google Scholar 

  5. Brzakovic D, Luo XM, Brzakovic P (1990) An approach to automated detection of tumours in mammograms. IEEE Trans Med Imaging 9(3):233–241

    Article  Google Scholar 

  6. Kegelmeyer WP, Pruneda Jr JM, Bourland PD, Hillis A, Riggs MW, Nipper ML (1994) Computer-aided mammographic screening for spiculated lesions. Radiology 191(2):331–337

    Google Scholar 

  7. Laws KI (1980) Rapid texture identification. In: Proceedings of SPIE, vol 238: Image processing for missile guidance, pp 376–380

  8. Rangayyan RM, Mudigonda NR, Desautels JEL (2000) Boundary modeling and shape analysis methods for classification of mammographic masses. Med Biol Eng Comput 38:487–95

    Article  Google Scholar 

  9. Sahiner BS, Chan H-P, Petrick N, Helvie MA, Hadjiiski LM (2001) Improvement of mammographic mass characterization using spiculation measures and morphological features. Med Phys 28(7):1455–1465

    Article  Google Scholar 

  10. Sahiner BS, Chan H-P, Petrick N, Helvie MA, Goodsitt MM (1998) Computerized characterization of masses on mammograms: the rubber band straightening transform and texture analysis. Med Phys 25(4):516–526

    Article  Google Scholar 

  11. Haralick RM, Shanmugam K, Dinstein I (1973) Textural features for image classification. IEEE Trans Syst Man Cybern SMC–3(6):610–621

    Article  Google Scholar 

  12. Haralick RM (1979) Statistical and structural approaches to texture. Proc IEEE 67(5):786–804

    Article  Google Scholar 

  13. Shen L, Rangayyan RM, Desautels JEL (1993) Detection and classification of mammographic calcifications. Int J Pattern Recognit Artif Intell 7(6):1403–1416

    Article  Google Scholar 

  14. Rangayyan RM, El-Faramawy NM, Desautels JEL, Alim OA (1997) Measures of acutance and shape for classification of breast tumors. IEEE Trans Med Imaging 16(6):799–810

    Article  Google Scholar 

  15. Sahiner BS, Chan HP, Petrick N, Wagner RF, Hadjiiski L (2000) Feature selection and classifier performance in computer-aided diagnosis: the effect of finite sample size. Med Phys 27(7):1509–1522

    Article  Google Scholar 

  16. Alto H, Rangayyan RM, Desautels JEL (2005) Content-based retrieval and analysis of mammographic masses. J Electron Imaging 14(2): Article no. 023016, pp 1–17

    Google Scholar 

  17. Theodoridis S, Koutroumbas K (2005) Pattern recognition. Academic, New York

    Google Scholar 

  18. Pearson K (1901) Principal components analysis. Lond Edinburgh Dublin Philos Mag J Sci 2(2):559

    Google Scholar 

  19. Alberta Cancer Board (2004) Screen test: Alberta Program for the early detection of breast cancer, 2001/2003 biennial report, Edmonton, Alberta. http://www.cancerboard.ab.ca/screentest/

  20. Mudigonda NR, Rangayyan RM, Desautels JEL (2000) Gradient and texture analysis for the classification of mammographic masses. IEEE Trans Med Imaging 19(10):1032–1043

    Article  Google Scholar 

  21. Mudigonda NR, Rangayyan RM, Desautels JEL (2001) Detection of breast masses in mammograms by density slicing and texture flow field analysis. IEEE Trans Med Imaging 20(12):1215–1227

    Article  Google Scholar 

  22. Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge, USA

    MATH  Google Scholar 

  23. Zhang L, Jack LB, Nandi AK (2005) Fault detection using genetic programming. Mech Syst Signal Process 19:271–289

    Article  Google Scholar 

  24. Guo H, Jack LB, Nandi AK (2005) Feature generation using genetic programming with application to fault classification. IEEE Trans Syst Man Cybern Part B 35(1):89–99

    Article  Google Scholar 

  25. Nordin P, Banzhaf W (1997) Real time control of a khepera robot using genetic programming. Cybern Control 26(3):533–561

    MathSciNet  Google Scholar 

  26. Kishore JK, Patnaik LM, Mani V, Agrawal VK (2000) Application of genetic programming for multicategory pattern classification. IEEE Trans Evol Comput 4(3):242–258

    Article  Google Scholar 

  27. Kudo M, Sklansky J (2000) Comparison of algorithms that select features for pattern classifiers. Pattern Recognit 33(1):25–41

    Article  Google Scholar 

  28. Press WH, Flannery BP, Teukolsky SA, Vetterling WT (1989) Numerical recipes in C. Cambridge University Press, Cambridge, UK

    Google Scholar 

  29. Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Statist 22(1):79–86

    Article  MathSciNet  MATH  Google Scholar 

  30. Nykter M (2004) Feature selection for Lymphoma outcome prediction. In: Proceedings of the 2nd TICSP workshop on computational systems biology. WCSB’2004, Silja Opera, Helsinki-St. Petersburg 14–16 June, pp 51–52

  31. Koller D, Shami M (1996) Toward optimal feature selection. In: Proceedings of the 13th international conference on machine learning. ICML–96, pp 284–292

  32. Levner I (2005) Feature selection and nearest centroid classification for protein mass spectrometry. BMC Bioinf 6:68. doi: 10.1186/1471–2105–6–68

  33. Sahiner B, Chan HP, Petrick N, Helvie MA, Goodsitt MM, Adler DA (1996) Classification of mass and normal breast tissue: feature selection using a genetic algorithm. In: Proceedings of 3rd internatrional workshop on digital mammography, Chicago, pp 379–384

  34. American College of Radiology (ACR) (1998) Illustrated breast imaging reporting and data system (BI-RADS), 3rd edn. American College of Radiology, Reston

  35. Fukunaga K, Hayes RR (1989) Effects of sample size in classifier design. IEEE Trans Pattern Anal Mach Intell 11(8):873–885

    Article  Google Scholar 

  36. Raudys SJ, Jain AK (1991) Small sample size effects in statistical pattern recognition: recommendations for practitioners. IEEE Trans Pattern Anal Mach Intell 13(3):252–264

    Article  Google Scholar 

  37. Duda RO, Hart PE (1973) Pattern classification and scene analysis. Wiley, New York

    MATH  Google Scholar 

  38. Efron B, Tibshirani RJ (1998) An introduction to the bootstrap. CRC Press LLC, Boca Raton

    Google Scholar 

  39. Liu Y, Smith MR, Rangayyan RM (2004) The application of Efron’s bootstrap methods in validating feature classification using artificial neural networks for the analysis of mammographic masses. In: 26th annual international conference of the IEEE engineering in medicine and biology society, San Francisco. IEEE, CA, pp 1553–1556

Download references

Acknowledgments

This research work was partly funded by the Medical Research Council, UK, through the InterDisciplinary Bridging Awards (IDBA) scheme, and by a grant from the University of Calgary Research Grants Committee. Authors would like to thank Mr. L. Zhang, a research student at the University of Liverpool, for his initial assistance with genetic programming code.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. K. Nandi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nandi, R.J., Nandi, A.K., Rangayyan, R.M. et al. Classification of breast masses in mammograms using genetic programming and feature selection. Med Bio Eng Comput 44, 683–694 (2006). https://doi.org/10.1007/s11517-006-0077-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11517-006-0077-6

Keywords