Abstract
Breast malignancy is the second most common cause of cancer death among women in Western countries. Identifying high-risk patients is vital in order to provide them with specialized treatment. In some situations, such as when access to experienced oncologists is not possible, decision support methods can be helpful in predicting the recurrence of cancer. Three thousand six hundred ninety-nine breast cancer patients admitted in south-east Sweden from 1986 to 1995 were studied. A decision tree was trained with all patients except for 100 cases and tested with those 100 cases. Two domain experts were asked for their opinions about the probability of recurrence of a certain outcome for these 100 patients. ROC curves, area under the ROC curves, and calibration for predictions were computed and compared. After comparing the predictions from a model built by data mining with predictions made by two domain experts, no significant differences were noted. In situations where experienced oncologists are not available, predictive models created with data mining techniques can be used to support physicians in decision making with acceptable accuracy.
Similar content being viewed by others
References
Sakorafas, G. H., Krespis, E., and Pavlakis, G., Risk estimation for breast cancer development; a clinical perspective. Surg. Oncol. 10(4):183–192, 2002 May.
Fieschi, M., Dufour, J. C., Staccini, P., Gouvernet, J., and Bouhaddou, O., Medical decision support systems: Old dilemmas and new paradigms? Methods Inf. Med. 42(3):190–198, 2003.
Fayyad, U., PiatetskyShapiro, G., and Smyth, P., From data mining to knowledge discovery in databases. AI Mag. 17(3):37–54, 1996 Fal.
Han, J., and Kamber, M., Data mining concepts and techniques. San Francisco: Morgan Kaufmann, 2001.
Quinlan, J. R., C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann, 1993.
Podgorelec, V., Kokol, P., Stiglic, B., and Rozman, I., Decision trees: An overview and their use in medicine. J. Med. Syst. 26(5):445–463, 2002 Oct.
Delen, D., Walker, G., and Kadam, A., Predicting breast cancer survivability: A comparison of three data mining methods. Artif. Intell. Med. 34(2):113–127, 2005 Jun.
Vlahou, A., Schorge, J. O., Gregory, B. W., and Coleman, R. L., Diagnosis of ovarian cancer using decision tree classification of mass spectral data. J. Biomed. Biotechnol. 4(5):308–314, 2003 Dec.
Gerald, L. B., Tang, S., Bruce, F., Redden, D., Kimerling, M. E., Brook, N., et al., A decision tree for tuberculosis contact investigation. Am. J. Respir. Crit. Care Med. 166(8):1122–1127, 2002 Oct.
Atlas, L., Cole, R., Muthusamy, Y., Lippman, A., Connor, J., Park, D., et al., A performance comparison of trained multilayer perceptrons and trained classification trees. IEEE International Conference on Systems, Man and Cybernetics; 1989 Oct. Cambridge, MA, USA: Institute of Electrical and Electronic Engineers, pp. 1614–1619, 1989.
Brown, D. E., Corruble, V., and Pittard, C. L., A comparison of decision tree classifiers with backpropagation neural networks for multimodal classification problems. Pattern Recogn. 26(6):953–961, 1993 Jun.
Talmon, J., Dassen, R., and Karthaus, V., Neural nets and classification trees: A comparison in the domain of ECG analysis. In: Gelsema, E. S., and Kanal, L. N., (Eds.), Pattern Recognition in Practice IV: Multiple Paradigms, Comparative Studies and Hybrid Systems; 1994. The Netherlands: Vlieland, pp. 415–423, 1994.
Esposito, F., Malerba, D., and Semeraro, G., A comparative analysis of methods for pruning decision trees. IEEE Trans. Pattern Anal. Machine Intel. 19(5):476–491, 1997 May.
Mehrotra, J., Vali, M., McVeigh, M., Kominsky, S. L., Fackler, M. J., Lahti-Domenici, J., et al., Very high frequency of hypermethylated genes in breast cancer metastasis to the bone, brain, and lung. Clin. Cancer Res. 10(9):3104–3109, 2004 May.
Wenger, C. R., and Clark, G. M., S-phase fraction and breast cancer—a decade of experience. Breast Cancer Res. Treatment 51(3):255–265, 1998.
Sundquist, M., Thorstenson, S., Brudin, L., Wingren, S., and Nordenskjold, B., Incidence and prognosis in early onset breast cancer. Breast 11(1):30–35, 2002 Feb.
Adami, H. O., Graffman, S., Johansson, H., and Rimsten, A., Survival and recurrences five years after selective treatment for breast carcinoma. Br. J. Cancer 38(5):624–630, 1978 Nov.
Sundquist, M., Thorstenson, S., Brudin, L., and Nordenskjold, B., Applying the Nottingham Prognostic Index to a Swedish breast cancer population. South East Swedish Breast Cancer Study Group. Breast Cancer Res. Treat. 53(1):1–8, 1999 Jan.
Ciocca, D. R., and Elledge, R., Molecular markers for predicting response to tamoxifen in breast cancer patients. Endocrine 13(1):1–10, 2000 Aug.
Lyman, G. H., Lyman, S., Balducci, L., Kuderer, N., Reintgen, D., Cox, C., et al., Age and the risk of breast cancer recurrence. Cancer Control 3(5):421–427, 1996 Oct.
Razavi, A. R., Gill, H., Stal, O., Sundquist, M., Thorstenson, S., Ahlfeldt, H., et al., Exploring cancer register data to find risk factors for recurrence of breast cancer—Application of Canonical Correlation Analysis. BMC Med. Inf. Decis. Mak. 5:29, 2005 Aug.
Tejler, G., Norberg, B., Dufmats, M., and Nordenskjold, B., Survival after treatment for breast cancer in a geographically defined population. Br. J. Surg. 91(10):1307–1312, 2004 Oct.
Piatetskyshapiro, G., Knowledge discovery in databases. IEEE Intell. Syst. Appl. 6(5):74–76, 1991 Oct.
Lavrac, N., Selected techniques for data mining in medicine. Artif. Intell. Med. 16(1):3–23, 1999 May.
Frawley, W. J., Piatetsky-Shapiro, G., and Matheus, C. J., Knowledge discovery in databases—An overview. AI Mag. 13:57–70, 1992.
Hand, D. J., Smyth, P., and Mannila, H., Principles of data mining. Cambridge: MIT Press, 2001.
Razavi, A. R., Gill, H., Åhlfeldt, H., and Shahsavar, N., A data pre-processing method to increase efficiency and accuracy in data mining. In: Miksch, S., Hunter, J., and Keravnou, E., (Eds.), 10th Conference on Artificial Intelligence in Medicine; 2005 July 23–27. Aberdeen, UK: Springer-Verlag GmbH, pp. 434–443, 2005.
Rubin, D. B., and Schenker, N., Multiple imputation in health-care databases—An overview and some applications. Stat. Med. 10(4):585–598, 1991 Apr.
Schafer, J. L., Analysis of incomplete multivariate data. London: Chapman & Hall, 1997.
McLachlan, G. J., and Krishnan, T., The EM algorithm and extensions. New York: Wiley, 1997.
Burke, H. B., Goodman, P. H., Rosen, D. B., Henson, D. E., Weinstein, J. N., Harrell, F. E. Jr., et al., Artificial neural networks improve the accuracy of cancer survival prediction. Cancer 79(4):857–862, 1997 Feb.
Luo, Y., and Lin, S., Information gain for genetic parameter estimation with incorporation of marker data. Biometrics 59(2):393–401, 2003 Jun.
Zorman, M., Eich, H. P., Stiglic, B., Ohmann, C., and Lenic, M., Does size really matter-using a decision tree approach for comparison of three different databases from the medical field of acute appendicitis. J. Med. Syst. 26(5):465–477, 2002 Oct.
Witten, I. H., and Frank, E., Data mining: Practical machine learning tools with Java implementations. San Francisco: Morgan Kaufmann, 2000.
Stone, M., Cross-validation choice and assessment of statistical predictions. J. Royal Stat. Soc. Ser. B 36:111–147, 1974.
Bradley, A. P., The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recogn. 30(7):1145–1159, 1997 Jul.
Holmes, J. H., Quantitative methods for evaluating learning classifier system performance in forced two-choice decision tasks. 2nd International Workshop on Learning Classifier Systems. pp. 250–257, 1999.
Ling, C. X., Huang, J., and Zhang, H., AUC: A better measure than accuracy in comparing learning algorithms. Adv. Artif. Intell. Proc. 2671:329–341, 2003.
Hosmer, D. W., and Lemeshow, S., Applied logistic regression. New York: Wiley, 1989.
Jaimes, F., Farbiarz, J., Alvarez, D., and Martinez, C., Comparison between logistic regression and neural networks to predict death in patients with suspected sepsis in the emergency room. Crit. Care 9(2):R150–R156, 2005 Apr.
Duhamel, A., Nuttens, M. C., Devos, P., Picavet, M., and Beuscart, R., A preprocessing method for improving data mining techniques. Application to a large medical diabetes database. Stud. Health Technol. Inf. 95:269–274, 2003.
Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P., SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16:321–357, 2002.
Crockett, K., Bandar, Z., and O’Shea, J., On producing balanced fuzzy decision tree classifiers. pp. 1756, 2006.
Acknowledgment
This study was supported by grant no. F2003-513 from FORSS, the Health Research Council in the South-East of Sweden. Special thanks are due to the South-East Swedish Breast Cancer Study Group for fruitful collaboration and support in this study.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Razavi, A.R., Gill, H., Åhlfeldt, H. et al. Predicting Metastasis in Breast Cancer: Comparing a Decision Tree with Domain Experts. J Med Syst 31, 263–273 (2007). https://doi.org/10.1007/s10916-007-9064-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10916-007-9064-1