Skip to main content

Advertisement

Log in

Predicting Metastasis in Breast Cancer: Comparing a Decision Tree with Domain Experts

  • Published:
Journal of Medical Systems Aims and scope Submit manuscript

Abstract

Breast malignancy is the second most common cause of cancer death among women in Western countries. Identifying high-risk patients is vital in order to provide them with specialized treatment. In some situations, such as when access to experienced oncologists is not possible, decision support methods can be helpful in predicting the recurrence of cancer. Three thousand six hundred ninety-nine breast cancer patients admitted in south-east Sweden from 1986 to 1995 were studied. A decision tree was trained with all patients except for 100 cases and tested with those 100 cases. Two domain experts were asked for their opinions about the probability of recurrence of a certain outcome for these 100 patients. ROC curves, area under the ROC curves, and calibration for predictions were computed and compared. After comparing the predictions from a model built by data mining with predictions made by two domain experts, no significant differences were noted. In situations where experienced oncologists are not available, predictive models created with data mining techniques can be used to support physicians in decision making with acceptable accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Sakorafas, G. H., Krespis, E., and Pavlakis, G., Risk estimation for breast cancer development; a clinical perspective. Surg. Oncol. 10(4):183–192, 2002 May.

    Article  Google Scholar 

  2. Fieschi, M., Dufour, J. C., Staccini, P., Gouvernet, J., and Bouhaddou, O., Medical decision support systems: Old dilemmas and new paradigms? Methods Inf. Med. 42(3):190–198, 2003.

    Google Scholar 

  3. Fayyad, U., PiatetskyShapiro, G., and Smyth, P., From data mining to knowledge discovery in databases. AI Mag. 17(3):37–54, 1996 Fal.

    Google Scholar 

  4. Han, J., and Kamber, M., Data mining concepts and techniques. San Francisco: Morgan Kaufmann, 2001.

    Google Scholar 

  5. Quinlan, J. R., C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann, 1993.

    Google Scholar 

  6. Podgorelec, V., Kokol, P., Stiglic, B., and Rozman, I., Decision trees: An overview and their use in medicine. J. Med. Syst. 26(5):445–463, 2002 Oct.

    Article  Google Scholar 

  7. Delen, D., Walker, G., and Kadam, A., Predicting breast cancer survivability: A comparison of three data mining methods. Artif. Intell. Med. 34(2):113–127, 2005 Jun.

    Article  Google Scholar 

  8. Vlahou, A., Schorge, J. O., Gregory, B. W., and Coleman, R. L., Diagnosis of ovarian cancer using decision tree classification of mass spectral data. J. Biomed. Biotechnol. 4(5):308–314, 2003 Dec.

    Article  Google Scholar 

  9. Gerald, L. B., Tang, S., Bruce, F., Redden, D., Kimerling, M. E., Brook, N., et al., A decision tree for tuberculosis contact investigation. Am. J. Respir. Crit. Care Med. 166(8):1122–1127, 2002 Oct.

    Article  Google Scholar 

  10. Atlas, L., Cole, R., Muthusamy, Y., Lippman, A., Connor, J., Park, D., et al., A performance comparison of trained multilayer perceptrons and trained classification trees. IEEE International Conference on Systems, Man and Cybernetics; 1989 Oct. Cambridge, MA, USA: Institute of Electrical and Electronic Engineers, pp. 1614–1619, 1989.

    Google Scholar 

  11. Brown, D. E., Corruble, V., and Pittard, C. L., A comparison of decision tree classifiers with backpropagation neural networks for multimodal classification problems. Pattern Recogn. 26(6):953–961, 1993 Jun.

    Article  Google Scholar 

  12. Talmon, J., Dassen, R., and Karthaus, V., Neural nets and classification trees: A comparison in the domain of ECG analysis. In: Gelsema, E. S., and Kanal, L. N., (Eds.), Pattern Recognition in Practice IV: Multiple Paradigms, Comparative Studies and Hybrid Systems; 1994. The Netherlands: Vlieland, pp. 415–423, 1994.

    Google Scholar 

  13. Esposito, F., Malerba, D., and Semeraro, G., A comparative analysis of methods for pruning decision trees. IEEE Trans. Pattern Anal. Machine Intel. 19(5):476–491, 1997 May.

    Article  Google Scholar 

  14. Mehrotra, J., Vali, M., McVeigh, M., Kominsky, S. L., Fackler, M. J., Lahti-Domenici, J., et al., Very high frequency of hypermethylated genes in breast cancer metastasis to the bone, brain, and lung. Clin. Cancer Res. 10(9):3104–3109, 2004 May.

    Article  Google Scholar 

  15. Wenger, C. R., and Clark, G. M., S-phase fraction and breast cancer—a decade of experience. Breast Cancer Res. Treatment 51(3):255–265, 1998.

    Article  Google Scholar 

  16. Sundquist, M., Thorstenson, S., Brudin, L., Wingren, S., and Nordenskjold, B., Incidence and prognosis in early onset breast cancer. Breast 11(1):30–35, 2002 Feb.

    Article  Google Scholar 

  17. Adami, H. O., Graffman, S., Johansson, H., and Rimsten, A., Survival and recurrences five years after selective treatment for breast carcinoma. Br. J. Cancer 38(5):624–630, 1978 Nov.

    Google Scholar 

  18. Sundquist, M., Thorstenson, S., Brudin, L., and Nordenskjold, B., Applying the Nottingham Prognostic Index to a Swedish breast cancer population. South East Swedish Breast Cancer Study Group. Breast Cancer Res. Treat. 53(1):1–8, 1999 Jan.

    Article  Google Scholar 

  19. Ciocca, D. R., and Elledge, R., Molecular markers for predicting response to tamoxifen in breast cancer patients. Endocrine 13(1):1–10, 2000 Aug.

    Article  Google Scholar 

  20. Lyman, G. H., Lyman, S., Balducci, L., Kuderer, N., Reintgen, D., Cox, C., et al., Age and the risk of breast cancer recurrence. Cancer Control 3(5):421–427, 1996 Oct.

    Google Scholar 

  21. Razavi, A. R., Gill, H., Stal, O., Sundquist, M., Thorstenson, S., Ahlfeldt, H., et al., Exploring cancer register data to find risk factors for recurrence of breast cancer—Application of Canonical Correlation Analysis. BMC Med. Inf. Decis. Mak. 5:29, 2005 Aug.

    Article  Google Scholar 

  22. Tejler, G., Norberg, B., Dufmats, M., and Nordenskjold, B., Survival after treatment for breast cancer in a geographically defined population. Br. J. Surg. 91(10):1307–1312, 2004 Oct.

    Article  Google Scholar 

  23. Piatetskyshapiro, G., Knowledge discovery in databases. IEEE Intell. Syst. Appl. 6(5):74–76, 1991 Oct.

    Google Scholar 

  24. Lavrac, N., Selected techniques for data mining in medicine. Artif. Intell. Med. 16(1):3–23, 1999 May.

    Article  Google Scholar 

  25. Frawley, W. J., Piatetsky-Shapiro, G., and Matheus, C. J., Knowledge discovery in databases—An overview. AI Mag. 13:57–70, 1992.

    Google Scholar 

  26. Hand, D. J., Smyth, P., and Mannila, H., Principles of data mining. Cambridge: MIT Press, 2001.

    Google Scholar 

  27. Razavi, A. R., Gill, H., Åhlfeldt, H., and Shahsavar, N., A data pre-processing method to increase efficiency and accuracy in data mining. In: Miksch, S., Hunter, J., and Keravnou, E., (Eds.), 10th Conference on Artificial Intelligence in Medicine; 2005 July 23–27. Aberdeen, UK: Springer-Verlag GmbH, pp. 434–443, 2005.

    Google Scholar 

  28. Rubin, D. B., and Schenker, N., Multiple imputation in health-care databases—An overview and some applications. Stat. Med. 10(4):585–598, 1991 Apr.

    Article  Google Scholar 

  29. Schafer, J. L., Analysis of incomplete multivariate data. London: Chapman & Hall, 1997.

    MATH  Google Scholar 

  30. McLachlan, G. J., and Krishnan, T., The EM algorithm and extensions. New York: Wiley, 1997.

    MATH  Google Scholar 

  31. Burke, H. B., Goodman, P. H., Rosen, D. B., Henson, D. E., Weinstein, J. N., Harrell, F. E. Jr., et al., Artificial neural networks improve the accuracy of cancer survival prediction. Cancer 79(4):857–862, 1997 Feb.

    Article  Google Scholar 

  32. Luo, Y., and Lin, S., Information gain for genetic parameter estimation with incorporation of marker data. Biometrics 59(2):393–401, 2003 Jun.

    Article  MathSciNet  Google Scholar 

  33. Zorman, M., Eich, H. P., Stiglic, B., Ohmann, C., and Lenic, M., Does size really matter-using a decision tree approach for comparison of three different databases from the medical field of acute appendicitis. J. Med. Syst. 26(5):465–477, 2002 Oct.

    Article  Google Scholar 

  34. Witten, I. H., and Frank, E., Data mining: Practical machine learning tools with Java implementations. San Francisco: Morgan Kaufmann, 2000.

    Google Scholar 

  35. Stone, M., Cross-validation choice and assessment of statistical predictions. J. Royal Stat. Soc. Ser. B 36:111–147, 1974.

    MATH  Google Scholar 

  36. Bradley, A. P., The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recogn. 30(7):1145–1159, 1997 Jul.

    Article  Google Scholar 

  37. Holmes, J. H., Quantitative methods for evaluating learning classifier system performance in forced two-choice decision tasks. 2nd International Workshop on Learning Classifier Systems. pp. 250–257, 1999.

  38. Ling, C. X., Huang, J., and Zhang, H., AUC: A better measure than accuracy in comparing learning algorithms. Adv. Artif. Intell. Proc. 2671:329–341, 2003.

    MathSciNet  Google Scholar 

  39. Hosmer, D. W., and Lemeshow, S., Applied logistic regression. New York: Wiley, 1989.

    Google Scholar 

  40. Jaimes, F., Farbiarz, J., Alvarez, D., and Martinez, C., Comparison between logistic regression and neural networks to predict death in patients with suspected sepsis in the emergency room. Crit. Care 9(2):R150–R156, 2005 Apr.

    Article  Google Scholar 

  41. Duhamel, A., Nuttens, M. C., Devos, P., Picavet, M., and Beuscart, R., A preprocessing method for improving data mining techniques. Application to a large medical diabetes database. Stud. Health Technol. Inf. 95:269–274, 2003.

    Google Scholar 

  42. Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P., SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16:321–357, 2002.

    MATH  Google Scholar 

  43. Crockett, K., Bandar, Z., and O’Shea, J., On producing balanced fuzzy decision tree classifiers. pp. 1756, 2006.

Download references

Acknowledgment

This study was supported by grant no. F2003-513 from FORSS, the Health Research Council in the South-East of Sweden. Special thanks are due to the South-East Swedish Breast Cancer Study Group for fruitful collaboration and support in this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amir R. Razavi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Razavi, A.R., Gill, H., Åhlfeldt, H. et al. Predicting Metastasis in Breast Cancer: Comparing a Decision Tree with Domain Experts. J Med Syst 31, 263–273 (2007). https://doi.org/10.1007/s10916-007-9064-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10916-007-9064-1

Keywords

Navigation