Abstract
Patient-targeted therapies have recently been highlighted as important. An important development in the treatment of metastatic non-small cell lung cancer (NSCLC) has been the tailoring of therapy on the basis of histology. A pathology diagnosis of “non-specified NSCLC” is no longer routinely acceptable; an effective approach for classification of adenocarcinoma (AC) and squamous carcinoma (SC) histotypes is needed for optimizing therapy. In this study, we present a robust and objective automatic computer vision system for real-time classification of AC and SC based on the morphological tissue patterns of hematoxylin and eosin (H&E) staining images to assist medical experts in the diagnosis of lung cancer. Various original and extended densitometric and Haralick’s texture features are used to extract image features, and a boosting algorithm is utilized to train the classifier, together with alternative decision tree as the base learner. For evaluation, two types of data with 653 tissue samples were tested, including 369 samples from tissue microarray data set and 284 samples from full-face tissue sections. Regarding the data distribution, 45 % are AC samples (288) and 55 % are SC samples (365), which is considerably well balanced for each class. Using tenfold cross-validation, the technique achieved high accuracy of \(92.41~\%\) on tissue microarray cores and \(95.42~\%\) on full tissue sections. We also found that the two boosting algorithms (cw-Boost and AdaBoost.M1) perform consistently well in comparison with other popularly adopted machine learning methods, including support vector machine, neural network and decision tree. This approach offers a robust, objective and rapid procedure for optimized patient-targeted therapy.




Similar content being viewed by others
References
Argiris, A., Gadgeel, S.M., Dacic, S.: Subdividing nsclc: Reflections on the past, present, and future of lung cancer therap. Oncology 23, 1–4 (2009)
Ambroise, C., McLachlan, G.J.: Selection bias in gene extraction on the basis of microarray gene-expression data. Proc. Nat. Acad. Sci. USA 99(10), 6562–6566 (2002)
American Cancer Society: http://www.cancer.org/. Accessed June 9 (2009)
Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Inc., New York (1995)
Chen, W., Foran, D.J.: Advances in cancer tissue microarray technology: towards improved understanding and diagnostics. Anal. Chim. Acta 564, 74–81 (2006)
Chapman, J., Miller, N., Lickley, H., Qian, J., Christens-Barry, W., Fu, Y., Yuan, Y., Axelrod, D.: Ductal carcinoma in situ of the breast (dcis) with heterogeneity of nuclear grade: prognostic effects of quantitative nuclear assessment. BMC Cancer 7(1), 174 (2007)
Dubey, S., Powell, C.A.: Update in lung cancer 2008. Am. J. Respir. Crit. Care Med. 179(10), 860–868 (2009)
Edwards, S.L., Roberts, C., McKean, M.A., Cockburn, J.S., Jeffrey, R.R., Kerr, K.M.: Pre-operative histological classification of primary lung cancer: accuracy of diagnosis and use of the non-small cell carcinomas. J. Clin. Pathol. 53, 537–540 (2000)
Freund, Y., Mason, L.: The alternating decision tree learning algorithm. In: Proceedings of the 16th International Conference on Machine Learning, pp 124–133. Morgan Kaufmann, San Francisco (1999)
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning, pp. 148–156 (1996)
Grilley-Olson, J.E., Hayes, D.N., Qaqish, B.F., Moore, D.T., Socinski, M.A., Yin, X., Travis, W.D., Funkhouser, W.K., et al.: Validation of inter-observer agreement in lung cancer assessment. J. Clin. Oncol. 27, 15s (2009)
Haralick R.M., Shanmugam K., Dinstein (1973) Textural features for image classification. IEEE Trans. Syst. Man Cybern. 3(6):610–621
Keerthi, S.S., Shevade, S.K., Bhattacharyya, C., Murthy, K.R.K.: Improvements to platt’s smo algorithm for svm classifier design. Neural Comput. 13(3), 637–649 (2001)
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI, pp. 1137–1145 (1995)
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)
Otsu, N.: A threshold selection method from gray level histograms. IEEE Trans. Syst. Man Cybern. 9, 62–66 (1979) (minimize inter class variance)
Quinlan, R.J.: C4.5: Programs for Machine Learning (Morgan Kaufmann Series in Machine Learning). Morgan Kaufmann, San Francisco (1993)
Sandler, A., Gray, R., Perry, M.C., Brahmer, J., Schiller, J.H., Dowlati, A., Lilenbaum, R., Johnson, D.H.: Paclitaxel-carboplatin alone or with bevacizumab for non-small-cell lung cancer. N Engl. J. Med. 355(24), 2542–2550 (2006)
Selvaggi, G.: Histologic subtype in nsclc: does it matter? Oncology 23, 1–11 (2009)
Ullmann, R., Morbini, P., Halbwedl, I., Bongiovanni, M., Gogg-Kammerer, M., Papotti, M., Gabor, S., Renner, H., Popper, H.H.: Protein expression profiles in adenocarcinomas and squamous cell carcinomas of the lung generated using tissue microarrays. J Pathol 203(3), 798–807 (2004)
Wallace, W.: The challenge of classifying poorly differentiated tumours in the lung. J. Histopathol. 54, 28–42 (2009)
Wang, C.-W., Hunter, A.: A low variance error boosting algorithm. Appl. Intell. 33, 357–369 (2009)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaugmann Series in Data Management Systems. Morgan Kaufmann, San Francisco (2005)
Zapotoczny, P., Zielinska, M., Nita, Z.: Application of image analysis for the varietal classification of barley: morphological features. J. Cereal Sci. 48(1), 104–110 (2008)
Acknowledgments
This project is partially funded by Tri-Service General Hospital, Taiwan, TSGH-C101-011. This work of C.W.W. was partially funded by NSC, 101-2628-E-011-006-MY3.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
In the calculation of Haralick features, four matrices are needed to describe different orientations (\(0^\circ , 45^\circ , 90^\circ , 135^\circ \)), and each matrix is a symmetric matrix with the dimensionality of \(N_\mathrm{{g}} \times N_\mathrm{{g}}\) where \(N_\mathrm{{g}}\) is the number of possible gray levels for a particular image. The co-occurrence matrices can be represented by a three-dimensional data structure; the third dimension is \(d\), which varies depending on the dimensions of the original input image. The statistical properties of co-occurrence matrix can be formulated as follows. \(p(i,j)\) is defined as the \((i,j)\)th entry in a normalized single-channel/tone spatial-dependence matrix.
where \(R=\sum _{i=1}^{N_\mathrm{{g}}}\sum _{j=1}^{N_\mathrm{{g}}}P(i,j)\).
\(p_x(i)\): the i-th entry in the marginal probability matrix obtained by summing the rows of \(p(i,j)\).
The 11 Haralick features are defined as follows.
where \(\upmu _x, \upmu _y,\sigma _x, \sigma _y\) are the means and standard deviations of \(p_x\) and \(p_y\).
Rights and permissions
About this article
Cite this article
Wang, CW., Yu, CP. Automated morphological classification of lung cancer subtypes using H&E tissue images. Machine Vision and Applications 24, 1383–1391 (2013). https://doi.org/10.1007/s00138-012-0457-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00138-012-0457-x