Skip to main content
Log in

Automated morphological classification of lung cancer subtypes using H&E tissue images

  • Special Issue Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Patient-targeted therapies have recently been highlighted as important. An important development in the treatment of metastatic non-small cell lung cancer (NSCLC) has been the tailoring of therapy on the basis of histology. A pathology diagnosis of “non-specified NSCLC” is no longer routinely acceptable; an effective approach for classification of adenocarcinoma (AC) and squamous carcinoma (SC) histotypes is needed for optimizing therapy. In this study, we present a robust and objective automatic computer vision system for real-time classification of AC and SC based on the morphological tissue patterns of hematoxylin and eosin (H&E) staining images to assist medical experts in the diagnosis of lung cancer. Various original and extended densitometric and Haralick’s texture features are used to extract image features, and a boosting algorithm is utilized to train the classifier, together with alternative decision tree as the base learner. For evaluation, two types of data with 653 tissue samples were tested, including 369 samples from tissue microarray data set and 284 samples from full-face tissue sections. Regarding the data distribution, 45 % are AC samples (288) and 55 % are SC samples (365), which is considerably well balanced for each class. Using tenfold cross-validation, the technique achieved high accuracy of \(92.41~\%\) on tissue microarray cores and \(95.42~\%\) on full tissue sections. We also found that the two boosting algorithms (cw-Boost and AdaBoost.M1) perform consistently well in comparison with other popularly adopted machine learning methods, including support vector machine, neural network and decision tree. This approach offers a robust, objective and rapid procedure for optimized patient-targeted therapy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Argiris, A., Gadgeel, S.M., Dacic, S.: Subdividing nsclc: Reflections on the past, present, and future of lung cancer therap. Oncology 23, 1–4 (2009)

    Google Scholar 

  2. Ambroise, C., McLachlan, G.J.: Selection bias in gene extraction on the basis of microarray gene-expression data. Proc. Nat. Acad. Sci. USA 99(10), 6562–6566 (2002)

    Article  MATH  Google Scholar 

  3. American Cancer Society: http://www.cancer.org/. Accessed June 9 (2009)

  4. Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Inc., New York (1995)

    Google Scholar 

  5. Chen, W., Foran, D.J.: Advances in cancer tissue microarray technology: towards improved understanding and diagnostics. Anal. Chim. Acta 564, 74–81 (2006)

    Article  Google Scholar 

  6. Chapman, J., Miller, N., Lickley, H., Qian, J., Christens-Barry, W., Fu, Y., Yuan, Y., Axelrod, D.: Ductal carcinoma in situ of the breast (dcis) with heterogeneity of nuclear grade: prognostic effects of quantitative nuclear assessment. BMC Cancer 7(1), 174 (2007)

    Article  Google Scholar 

  7. Dubey, S., Powell, C.A.: Update in lung cancer 2008. Am. J. Respir. Crit. Care Med. 179(10), 860–868 (2009)

    Article  Google Scholar 

  8. Edwards, S.L., Roberts, C., McKean, M.A., Cockburn, J.S., Jeffrey, R.R., Kerr, K.M.: Pre-operative histological classification of primary lung cancer: accuracy of diagnosis and use of the non-small cell carcinomas. J. Clin. Pathol. 53, 537–540 (2000)

    Article  Google Scholar 

  9. Freund, Y., Mason, L.: The alternating decision tree learning algorithm. In: Proceedings of the 16th International Conference on Machine Learning, pp 124–133. Morgan Kaufmann, San Francisco (1999)

  10. Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning, pp. 148–156 (1996)

  11. Grilley-Olson, J.E., Hayes, D.N., Qaqish, B.F., Moore, D.T., Socinski, M.A., Yin, X., Travis, W.D., Funkhouser, W.K., et al.: Validation of inter-observer agreement in lung cancer assessment. J. Clin. Oncol. 27, 15s (2009)

  12. Haralick R.M., Shanmugam K., Dinstein (1973) Textural features for image classification. IEEE Trans. Syst. Man Cybern. 3(6):610–621

  13. Keerthi, S.S., Shevade, S.K., Bhattacharyya, C., Murthy, K.R.K.: Improvements to platt’s smo algorithm for svm classifier design. Neural Comput. 13(3), 637–649 (2001)

    Google Scholar 

  14. Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI, pp. 1137–1145 (1995)

  15. Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)

    Article  MathSciNet  Google Scholar 

  16. Otsu, N.: A threshold selection method from gray level histograms. IEEE Trans. Syst. Man Cybern. 9, 62–66 (1979) (minimize inter class variance)

    Google Scholar 

  17. Quinlan, R.J.: C4.5: Programs for Machine Learning (Morgan Kaufmann Series in Machine Learning). Morgan Kaufmann, San Francisco (1993)

  18. Sandler, A., Gray, R., Perry, M.C., Brahmer, J., Schiller, J.H., Dowlati, A., Lilenbaum, R., Johnson, D.H.: Paclitaxel-carboplatin alone or with bevacizumab for non-small-cell lung cancer. N Engl. J. Med. 355(24), 2542–2550 (2006)

    Article  Google Scholar 

  19. Selvaggi, G.: Histologic subtype in nsclc: does it matter? Oncology 23, 1–11 (2009)

    Google Scholar 

  20. Ullmann, R., Morbini, P., Halbwedl, I., Bongiovanni, M., Gogg-Kammerer, M., Papotti, M., Gabor, S., Renner, H., Popper, H.H.: Protein expression profiles in adenocarcinomas and squamous cell carcinomas of the lung generated using tissue microarrays. J Pathol 203(3), 798–807 (2004)

    Article  Google Scholar 

  21. Wallace, W.: The challenge of classifying poorly differentiated tumours in the lung. J. Histopathol. 54, 28–42 (2009)

    Article  Google Scholar 

  22. Wang, C.-W., Hunter, A.: A low variance error boosting algorithm. Appl. Intell. 33, 357–369 (2009)

    Google Scholar 

  23. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaugmann Series in Data Management Systems. Morgan Kaufmann, San Francisco (2005)

  24. Zapotoczny, P., Zielinska, M., Nita, Z.: Application of image analysis for the varietal classification of barley: morphological features. J. Cereal Sci. 48(1), 104–110 (2008)

    Article  Google Scholar 

Download references

Acknowledgments

This project is partially funded by Tri-Service General Hospital, Taiwan, TSGH-C101-011. This work of C.W.W. was partially funded by NSC, 101-2628-E-011-006-MY3.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ching-Wei Wang.

Appendix

Appendix

In the calculation of Haralick features, four matrices are needed to describe different orientations (\(0^\circ , 45^\circ , 90^\circ , 135^\circ \)), and each matrix is a symmetric matrix with the dimensionality of \(N_\mathrm{{g}} \times N_\mathrm{{g}}\) where \(N_\mathrm{{g}}\) is the number of possible gray levels for a particular image. The co-occurrence matrices can be represented by a three-dimensional data structure; the third dimension is \(d\), which varies depending on the dimensions of the original input image. The statistical properties of co-occurrence matrix can be formulated as follows. \(p(i,j)\) is defined as the \((i,j)\)th entry in a normalized single-channel/tone spatial-dependence matrix.

$$\begin{aligned} p(i,j)=P(i,j)/R \end{aligned}$$
(3)

where \(R=\sum _{i=1}^{N_\mathrm{{g}}}\sum _{j=1}^{N_\mathrm{{g}}}P(i,j)\).

\(p_x(i)\): the i-th entry in the marginal probability matrix obtained by summing the rows of \(p(i,j)\).

$$\begin{aligned} p_x(i)=\sum _{j=1}^{N_\mathrm{{g}}}p(i,j) \end{aligned}$$
(4)

The 11 Haralick features are defined as follows.

$$\begin{aligned} f_1&= \sum _{i=1}^{N_\mathrm{{g}}}\sum _{j=1}^{N_\mathrm{{g}}}(p(i,j))^2 \end{aligned}$$
(5)
$$\begin{aligned} f_2&= \sum _{k=0}^{N_\mathrm{{g}}-1}k^2 p_{x-y}(k) \end{aligned}$$
(6)
$$\begin{aligned} f_3&= \frac{\sum _{i=1}^{N_\mathrm{{g}}}\sum _{j=1}^{N_\mathrm{{g}}}(ij)p(i,j)-\upmu _x\upmu _y}{\sigma _x\sigma _y} \end{aligned}$$
(7)

where \(\upmu _x, \upmu _y,\sigma _x, \sigma _y\) are the means and standard deviations of \(p_x\) and \(p_y\).

$$\begin{aligned} f_4&= \sum _{i=1}^{N_\mathrm{{g}}}\sum _{j=1}^{N_\mathrm{{g}}}(i-\upmu )^2p(i,j) \end{aligned}$$
(8)
$$\begin{aligned} f_5&= \sum _{i=1}^{N_\mathrm{{g}}}\sum _{j=1}^{N_\mathrm{{g}}}\frac{1}{1+(i-j)^2}p(i,j) \end{aligned}$$
(9)
$$\begin{aligned} f_6&= \sum _{i=2}^{2N_\mathrm{{g}}}ip_{x+y}(i) \end{aligned}$$
(10)
$$\begin{aligned} f_7&= \sum _{i=2}^{2N_\mathrm{{g}}}(i-f_8)^2p_{x+y}(i) \end{aligned}$$
(11)
$$\begin{aligned} f_8&= -\sum _{i=2}^{2N_\mathrm{{g}}}p_{x+y}(i)\log (p_{x+y}(i)) \end{aligned}$$
(12)
$$\begin{aligned} f_9&= -\sum _{i=1}^{N_\mathrm{{g}}}\sum _{j=1}^{N_\mathrm{{g}}}p(i,j)\log (p(i,j)) \end{aligned}$$
(13)
$$\begin{aligned} f_{10}&= \sigma _{p_{x-y}} \end{aligned}$$
(14)
$$\begin{aligned} f_{11}&= -\sum _{i=0}^{N_\mathrm{{g}}-1}p_{x-y}(i)\log (p_{x+y}(i)) \end{aligned}$$
(15)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, CW., Yu, CP. Automated morphological classification of lung cancer subtypes using H&E tissue images. Machine Vision and Applications 24, 1383–1391 (2013). https://doi.org/10.1007/s00138-012-0457-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-012-0457-x

Keywords

Navigation