Abstract
Previous computer-aided lung cancer image classification methods are all cost-blind, which assume that the misdiagnosis (categorizing a cancerous image as a normal one or categorizing a normal image as a cancerous one) costs are equal. In addition, previous methods usually require experienced pathologists to label a large amount of images as training samples. To this end, a novel transductive cost-sensitive method is proposed for lung cancer image classification on needle biopsies specimens, which only requires the pathologist to label a small amount of images. The proposed method analyzes lung cancer images in the following procedures: (i) an image capturing procedure to capture images from the needle biopsies specimens; (ii) a preprocessing procedure to segment the individual cells from the captured images; (iii) a feature extraction procedure to extract features (i.e. shape, color, texture and statistical information) from the obtained individual cells; (iv) a codebook learning procedure to learn a codebook on the extracted features by adopting k-means clustering, which aims to represent each image as a histogram over different codewords; (v) an image classification procedure to predict labels for testing images using the proposed multi-class cost-sensitive Laplacian regularized least squares (mCLRLS). We evaluate the proposed method on a real-image set provided by Bayi Hospital, which contains 271 images including normal ones and four types of cancerous ones (squamous carcinoma, adenocarcinoma, small cell cancer and nuclear atypia). The experimental results demonstrate that the proposed method achieves a lower cancer-misdiagnosis rate and lower total misdiagnosis costs comparing with previous methods, which includes the supervised learning approach (kNN, mcSVM and MCMI-AdaBoost), semi-supervised learning approach (LapRLS) and cost-sensitive approach (CS-SVM). Meanwhile, the experiments also disclose that both transductive and cost-sensitive settings are useful when only a small amount of training images are available.
Similar content being viewed by others
Notes
The clusters number is selected by searching from 1 to 20 using 10-fold cross validation, we found that choosing 7, the approaches which are related with codebook learning can obtain the best results.
The number of neighbors in kNN is selected by searching from 1 to 10 using 10-fold cross validation, and the number corresponding to the best results is chosen.
References
Aribarg T, Supratid S, Lursinsap C (2012) Optimizing the modified fuzzy ant-miner for efficient medical diagnosis. Appl Intell. doi:10.1007/s10489-011-0332-x
Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7:2399–2434
Cai H, Yan F, Mikolajczyk K (2010) Learning weights for codebook in image classification and retrieval. In: Proceedings of the IEEE international conference on computer vision and pattern recognition (CVPR), pp 2320–2327
Chiang I-J, Shieh M-J, Hsu JY, Wong J-M (2005) Building a medical decision support system for colon polyp screening by using fuzzy classification trees. Appl Intell 22:61–75
Cho S-B, Won H-H (2007) Cancer classification using ensemble of neural networks with multiple significant gene subsets. Appl Intell 26:243–250
Dasovich G, Kim R, Raicu D, Furst J (2010) A model for the relationship between semantic and content based similarity using LIDC. SPIE Med Imaging
Depeursinge A, Racoceanu D, Iavindrasana J, Cohen G, Platon A, Poletti PA, Müller H (2011) Fusing visual and clinical information for lung tissue classification in high-resolution computed tomography. Artif Intell Med 50:13–21
Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: Proceedings of the 13th international conference on machine learning (ICML), pp 325–332
Galar M, Fernández A, Barrenechea E, Bustince H, Herrera F (2011) An overview of ensemble methods for binary classifiers in multi-class problems: experimental study on one-vs-one and one-vs-all schemes. Pattern Recognit 44:1761–1776
García-Nieto J, Alba E (2011) Parallel multi-swarm optimizer for gene selection in DNA microarrays. Appl Intell. doi:10.1007/s10489-011-0325-9
Gómez-Ruiz JA, Jerez-Aragonés JM, Muñoz-Pérez J, Alba-Conejo E (2004) A neural network based model for prognosis of early breast cancer. Appl Intell 20(3):231–238
Kovalev V, Harder N, Neumann B, Held M, Liebel U, Erfle H, Ellenberg J, Ellis R, Rohr K (2006) Feature selection for evaluating florescence microscopy images in geneme-wide cell screens. In: Proceedings of the IEEE international conference on computer vision and pattern recognition (CVPR), pp 276–283
Huang H, Shen L, Ford J, Gao L, Pearlman J (2005) Early lung cancer detection based on registered perfusion MRI. J Oncol Rep 15:1080–1084
Lazebnik S, Raqinsky M (2009) Supervised learning of quantizer codebooks by information loss minimization. IEEE Trans Pattern Anal Mach Intell 31(7):1294–1309
Lee Y, Lin Y, Wahba G (2004) Multicategory support vector machines: theory and application to the classification of microarray data and satellite radiance data. J Am Stat Assoc 99:67–81
Lee MC, Boroczky L, Sungur-Stasik K, Cann AD, Borczuk AC, Kawut SM, Powell CA (2010) Computer-aided diagnosis of pulmonary nodules using a two-step approach for feature selection and classifier ensemble construction. Artif Intell Med 50:43–53
Fei-Fei L, Perona P (2005) A Bayesian hierarchical model for learning natural scene categories. In: Proceedings of the IEEE international conference on computer vision and pattern recognition (CVPR), pp 524–531
Madabhushi A, Feldman MD, Metaxas DN, Tomaszeweski J, Chute D (2005) Automated detection of prostatic adenocarcinoma from high-resolution ex vivo MRI. IEEE Trans Med Imaging 24(12):1611–1625
Maglogiannis I, Zafiropoulos E, Anagnostopoulos I (2009) An intelligent system for automated breast cancer diagnosis and prognosis using SVM based classifiers. Appl Intell 30:24–36
Montani S (2008) Exploring new roles for case-based reasoning in heterogeneous AI systems for medical decision support. Appl Intell 28:275–285
Mori K, Hasegawa J, Toriwaki J, Anno H, Katada K (1996) Recognition of bronchus in three-dimensional X-ray CT images with applications to virtualized bronchoscopy system. In: Proceedings of the international conference on pattern recognition (ICPR), pp 528–532
Morik K, Brochhausen P, Joachims T (1999) Combining statistical learning with a knowledge-based approach: a case study in intensive care monitoring. In: Proceedings of the 16th international conference on machine learning (ICML), pp 268–277
Own C-M (2009) Switching between type-2 fuzzy sets and intuitionistic fuzzy sets: an application in medical diagnosis. Appl Intell 31:283–291
Scholkopf B, Herbrich R, Smola AJ (2001) A generalized representer theorem. In: Proceedings of the annual conference on learning theory (COLT), pp 416–426
Sparks R, Madabhushi A (2011) Out-of-sample extrapolation using semi-supervised manifold learning (OSE-SSL): content-based image retrieval for prostate histology grading. In: Proceedings of the IEEE international symposium on biomedical imaging (ISBI), pp 734–737
Tiwari P, Kurhanewicz J, Rosen M, Madabhushi A (2010) Semi supervised multi kernel (SeSMiK) graph embedding: identifying aggressive prostate cancer via magnetic resonance imaging and spectroscopy. In: Proceedings of the international conference on medical image computing and computer-assisted intervention (MICCAI), pp 667–673
Wang J, Zucker J-D (2000) Solving the multiple-instance problem: a lazy learning approach. In: Proceedings of the 17th international conference on machine learning (ICML), pp 1119–1125
Wang D, Lim J, Han M, Lee B (2005) Learning similarity for semantic images classification. Neurocomputing 67:363–368
Yang Y, Chen S, Lin H, Ye Y (2004) A chromatic image understanding system for lung cancer cell identification based ob fuzzy knowledge. In IEA/AIE, pp 392–401
Zhang Y, Zhou Z-H (2008) Cost-sensitive face recognition. In: Proceedings of the IEEE international conference on computer vision and pattern recognition (CVPR), pp 1758–1769
Zhou Z-H, Jiang Y, Yang Y-B, Chen S-F (2002) Lung cancer cell identification based on artificial neural network ensembles. Artif Intell Med 24(1):25–36
Zhou Z-H, Liu X-Y (2006) On multi-class cost-sensitive learning. In: Proceedings of the 21st national conference on artificial intelligence (AAAI), pp 567–572
Zhu L, Zhao B, Gao Y (2008) Multi-class multi-instance learning approach for lung cancer cell classification based on bag feature selection. In: Proceedings of the 5th international conference on fuzzy systems and knowledge discovery, pp 487–492
Acknowledgements
The authors would like to acknowledge the support for this work from the National Science Foundation of China (Grant Nos. 61035003, 61175042, 61021062), the National 973 Program of China (Grant No. 2009CB320702), the 973 Program of Jiangsu, China (Grant No. BK2011005) and Program for New Century Excellent Talents in University (Grant No. NCET-10-0476). The authors wish to thank the anonymous reviewers for their valuable suggestions.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shi, Y., Gao, Y., Wang, R. et al. Transductive cost-sensitive lung cancer image classification. Appl Intell 38, 16–28 (2013). https://doi.org/10.1007/s10489-012-0354-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-012-0354-z