Skip to main content
Log in

Transductive cost-sensitive lung cancer image classification

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Previous computer-aided lung cancer image classification methods are all cost-blind, which assume that the misdiagnosis (categorizing a cancerous image as a normal one or categorizing a normal image as a cancerous one) costs are equal. In addition, previous methods usually require experienced pathologists to label a large amount of images as training samples. To this end, a novel transductive cost-sensitive method is proposed for lung cancer image classification on needle biopsies specimens, which only requires the pathologist to label a small amount of images. The proposed method analyzes lung cancer images in the following procedures: (i) an image capturing procedure to capture images from the needle biopsies specimens; (ii) a preprocessing procedure to segment the individual cells from the captured images; (iii) a feature extraction procedure to extract features (i.e. shape, color, texture and statistical information) from the obtained individual cells; (iv) a codebook learning procedure to learn a codebook on the extracted features by adopting k-means clustering, which aims to represent each image as a histogram over different codewords; (v) an image classification procedure to predict labels for testing images using the proposed multi-class cost-sensitive Laplacian regularized least squares (mCLRLS). We evaluate the proposed method on a real-image set provided by Bayi Hospital, which contains 271 images including normal ones and four types of cancerous ones (squamous carcinoma, adenocarcinoma, small cell cancer and nuclear atypia). The experimental results demonstrate that the proposed method achieves a lower cancer-misdiagnosis rate and lower total misdiagnosis costs comparing with previous methods, which includes the supervised learning approach (kNN, mcSVM and MCMI-AdaBoost), semi-supervised learning approach (LapRLS) and cost-sensitive approach (CS-SVM). Meanwhile, the experiments also disclose that both transductive and cost-sensitive settings are useful when only a small amount of training images are available.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. The clusters number is selected by searching from 1 to 20 using 10-fold cross validation, we found that choosing 7, the approaches which are related with codebook learning can obtain the best results.

  2. The number of neighbors in kNN is selected by searching from 1 to 10 using 10-fold cross validation, and the number corresponding to the best results is chosen.

References

  1. Aribarg T, Supratid S, Lursinsap C (2012) Optimizing the modified fuzzy ant-miner for efficient medical diagnosis. Appl Intell. doi:10.1007/s10489-011-0332-x

    Google Scholar 

  2. Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7:2399–2434

    MathSciNet  MATH  Google Scholar 

  3. Cai H, Yan F, Mikolajczyk K (2010) Learning weights for codebook in image classification and retrieval. In: Proceedings of the IEEE international conference on computer vision and pattern recognition (CVPR), pp 2320–2327

    Google Scholar 

  4. Chiang I-J, Shieh M-J, Hsu JY, Wong J-M (2005) Building a medical decision support system for colon polyp screening by using fuzzy classification trees. Appl Intell 22:61–75

    Article  Google Scholar 

  5. Cho S-B, Won H-H (2007) Cancer classification using ensemble of neural networks with multiple significant gene subsets. Appl Intell 26:243–250

    Article  MATH  Google Scholar 

  6. Dasovich G, Kim R, Raicu D, Furst J (2010) A model for the relationship between semantic and content based similarity using LIDC. SPIE Med Imaging

  7. Depeursinge A, Racoceanu D, Iavindrasana J, Cohen G, Platon A, Poletti PA, Müller H (2011) Fusing visual and clinical information for lung tissue classification in high-resolution computed tomography. Artif Intell Med 50:13–21

    Article  Google Scholar 

  8. Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: Proceedings of the 13th international conference on machine learning (ICML), pp 325–332

    Google Scholar 

  9. Galar M, Fernández A, Barrenechea E, Bustince H, Herrera F (2011) An overview of ensemble methods for binary classifiers in multi-class problems: experimental study on one-vs-one and one-vs-all schemes. Pattern Recognit 44:1761–1776

    Article  Google Scholar 

  10. García-Nieto J, Alba E (2011) Parallel multi-swarm optimizer for gene selection in DNA microarrays. Appl Intell. doi:10.1007/s10489-011-0325-9

    Google Scholar 

  11. Gómez-Ruiz JA, Jerez-Aragonés JM, Muñoz-Pérez J, Alba-Conejo E (2004) A neural network based model for prognosis of early breast cancer. Appl Intell 20(3):231–238

    Article  Google Scholar 

  12. Kovalev V, Harder N, Neumann B, Held M, Liebel U, Erfle H, Ellenberg J, Ellis R, Rohr K (2006) Feature selection for evaluating florescence microscopy images in geneme-wide cell screens. In: Proceedings of the IEEE international conference on computer vision and pattern recognition (CVPR), pp 276–283

    Google Scholar 

  13. Huang H, Shen L, Ford J, Gao L, Pearlman J (2005) Early lung cancer detection based on registered perfusion MRI. J Oncol Rep 15:1080–1084

    Google Scholar 

  14. Lazebnik S, Raqinsky M (2009) Supervised learning of quantizer codebooks by information loss minimization. IEEE Trans Pattern Anal Mach Intell 31(7):1294–1309

    Article  Google Scholar 

  15. Lee Y, Lin Y, Wahba G (2004) Multicategory support vector machines: theory and application to the classification of microarray data and satellite radiance data. J Am Stat Assoc 99:67–81

    Article  MathSciNet  MATH  Google Scholar 

  16. Lee MC, Boroczky L, Sungur-Stasik K, Cann AD, Borczuk AC, Kawut SM, Powell CA (2010) Computer-aided diagnosis of pulmonary nodules using a two-step approach for feature selection and classifier ensemble construction. Artif Intell Med 50:43–53

    Article  Google Scholar 

  17. Fei-Fei L, Perona P (2005) A Bayesian hierarchical model for learning natural scene categories. In: Proceedings of the IEEE international conference on computer vision and pattern recognition (CVPR), pp 524–531

    Google Scholar 

  18. Madabhushi A, Feldman MD, Metaxas DN, Tomaszeweski J, Chute D (2005) Automated detection of prostatic adenocarcinoma from high-resolution ex vivo MRI. IEEE Trans Med Imaging 24(12):1611–1625

    Article  Google Scholar 

  19. Maglogiannis I, Zafiropoulos E, Anagnostopoulos I (2009) An intelligent system for automated breast cancer diagnosis and prognosis using SVM based classifiers. Appl Intell 30:24–36

    Article  Google Scholar 

  20. Montani S (2008) Exploring new roles for case-based reasoning in heterogeneous AI systems for medical decision support. Appl Intell 28:275–285

    Article  Google Scholar 

  21. Mori K, Hasegawa J, Toriwaki J, Anno H, Katada K (1996) Recognition of bronchus in three-dimensional X-ray CT images with applications to virtualized bronchoscopy system. In: Proceedings of the international conference on pattern recognition (ICPR), pp 528–532

    Chapter  Google Scholar 

  22. Morik K, Brochhausen P, Joachims T (1999) Combining statistical learning with a knowledge-based approach: a case study in intensive care monitoring. In: Proceedings of the 16th international conference on machine learning (ICML), pp 268–277

    Google Scholar 

  23. Own C-M (2009) Switching between type-2 fuzzy sets and intuitionistic fuzzy sets: an application in medical diagnosis. Appl Intell 31:283–291

    Article  Google Scholar 

  24. Scholkopf B, Herbrich R, Smola AJ (2001) A generalized representer theorem. In: Proceedings of the annual conference on learning theory (COLT), pp 416–426

    Google Scholar 

  25. Sparks R, Madabhushi A (2011) Out-of-sample extrapolation using semi-supervised manifold learning (OSE-SSL): content-based image retrieval for prostate histology grading. In: Proceedings of the IEEE international symposium on biomedical imaging (ISBI), pp 734–737

    Google Scholar 

  26. Tiwari P, Kurhanewicz J, Rosen M, Madabhushi A (2010) Semi supervised multi kernel (SeSMiK) graph embedding: identifying aggressive prostate cancer via magnetic resonance imaging and spectroscopy. In: Proceedings of the international conference on medical image computing and computer-assisted intervention (MICCAI), pp 667–673

    Google Scholar 

  27. Wang J, Zucker J-D (2000) Solving the multiple-instance problem: a lazy learning approach. In: Proceedings of the 17th international conference on machine learning (ICML), pp 1119–1125

    Google Scholar 

  28. Wang D, Lim J, Han M, Lee B (2005) Learning similarity for semantic images classification. Neurocomputing 67:363–368

    Article  Google Scholar 

  29. Yang Y, Chen S, Lin H, Ye Y (2004) A chromatic image understanding system for lung cancer cell identification based ob fuzzy knowledge. In IEA/AIE, pp 392–401

  30. Zhang Y, Zhou Z-H (2008) Cost-sensitive face recognition. In: Proceedings of the IEEE international conference on computer vision and pattern recognition (CVPR), pp 1758–1769

    Google Scholar 

  31. Zhou Z-H, Jiang Y, Yang Y-B, Chen S-F (2002) Lung cancer cell identification based on artificial neural network ensembles. Artif Intell Med 24(1):25–36

    Article  MATH  Google Scholar 

  32. Zhou Z-H, Liu X-Y (2006) On multi-class cost-sensitive learning. In: Proceedings of the 21st national conference on artificial intelligence (AAAI), pp 567–572

    Google Scholar 

  33. Zhu L, Zhao B, Gao Y (2008) Multi-class multi-instance learning approach for lung cancer cell classification based on bag feature selection. In: Proceedings of the 5th international conference on fuzzy systems and knowledge discovery, pp 487–492

    Chapter  Google Scholar 

Download references

Acknowledgements

The authors would like to acknowledge the support for this work from the National Science Foundation of China (Grant Nos. 61035003, 61175042, 61021062), the National 973 Program of China (Grant No. 2009CB320702), the 973 Program of Jiangsu, China (Grant No. BK2011005) and Program for New Century Excellent Talents in University (Grant No. NCET-10-0476). The authors wish to thank the anonymous reviewers for their valuable suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yang Gao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shi, Y., Gao, Y., Wang, R. et al. Transductive cost-sensitive lung cancer image classification. Appl Intell 38, 16–28 (2013). https://doi.org/10.1007/s10489-012-0354-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-012-0354-z

Keywords

Navigation