Abstract
The TNM staging system is universally used for classification of cancer. This system is limited since it uses only three factors (tumor size, extent of spread to lymph nodes, and status of distant metastasis) to generate stage groups. To provide a more accurate description of cancer and thus better patient care, additional factors or variables should be used to classify cancer. In this paper we propose a hierarchical clustering algorithm to develop prognostic systems that classify cancer according to multiple prognostic factors. This algorithm has many potential applications in augmenting the data currently obtained in a staging system by allowing more prognostic factors to be incorporated. The algorithm clusters combinations of prognostic factors that are formed using categories of factors. The dissimilarity between two combinations is determined by the area between two corresponding survival curves. Groups from cutting the dendrogram and survival curves of the individual groups define our prognostic systems that classify patients using survival outcomes. A demonstration of the proposed algorithm is given for patients with breast cancer from the Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute.








Similar content being viewed by others
References
Siegel, R.L., Miller, K.D., Jemal, A., Cancer statistics. CA Cancer J. Clin. 65:5–29, 2015.
Edge, S.B., Byrd, D.R., Compton, C.C., Fritz, A.G., Green, F.L., AJCC Cancer staging manual. 7 ed. New York: Springer, 2010.
Andreu-Perez, J., Poon, C.C.Y., Merrifield, R.D., Wong, S.T.C., Yang, G.Z., Big data for health. IEEE J. Biomed. Health Inform. 19(4):1193–1208, 2015.
Klein, J.P., and Moeschberger, M.L., Survival Analysis: Techniques for Censored and Truncated Data. 2nd. New York: Springer, 2003.
Gimotty, P.A., Guerry, D., Ming, M.E., et al., Thin Primary Cutaneous Malignant Melanoma: A Prognostic Tree for 10-Year Metastasis Is More Accurate Than American Joint Committee on Cancer Staging. J. Clin. Oncol. 22:3668–3676, 2004.
Chen, D., Xing, K., Henson, D., Sheng, L., Schwartz, A., Cheng, X.: Developing Prognostic Systems of Cancer Patients by Ensemble Clustering. doi:10.1155/2009/632786 (2009)
Wu, D., Yang, C., Wong, S., Meyerle, J., Zhang, B., Chen, D., An examination of TNM staging of melanoma by a machine learning algorithm. Proceedings of 2012 International Conference on Computerized Healthcare, pp. 120–126, 2012.
Qi, R., Wu, D., Sheng, L., Henson, D., Schwartz, A., Xu, E., Xing, K., Chen, D., On an Ensemble algorithm for clustering cancer patient data. BMC Syst. Biol., 2013. doi:10.1186/1752-0509-7-S4-S9.
Kaplan, E.L., and Meier, P., Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc. 53:457–481, 1958.
Lin, X, and Xu, Q., A new method for the comparison of survival distributions. Pharmaceut. Statist. 9: 67–76, 2010.
Li, H., Han, D., Hou, Y., Chen, H., Chen, Z., Statistical inference methods for two crossing survival curves: A comparison of methods. PLoS ONE 10(1):e0116774, 2015. doi:10.1371/journal.pone.0116774.
Chen, D., Hueman, M.T., Henson, D.E., Schwartz, A.M., An algorithm for expanding the TNM staging system. Future Oncol. 12(8):1015–24, 2016.
Hastie, T., Tibshirani, R., Friedman, J., The elements of statistical learning: Data mining, inference, and prediction. 2nd Edn. New York: Springer, 2013.
Chen, D., Wang, H., Henson, D.E., Sheng, L., Hueman, M.T., Schwartz, A.M.: Clustering Cancer Data by Areas between Survival Curves. Submitted
The R Project for Statistical Computing. http://www.r-project.org
SEER: http://seer.cancer.gov/
Henson, D.E., Ries, L., Freedman, L.S., et al., Relationship among outcome, stage of disease, and histologic grade for 22,616 cases of breast cancer. Cancer 68:2142–2149, 1991.
Kaufman, L., and Rousseeuw, P., Finding Groups in Data: An introduction to cluster analysis. New York: Wiley, 1990.
Harrell, F.E., Lee, K.L., Mark D.B., Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat. Med. 15:361–387, 1996.
Acknowledgments
This work was partially supported by the grant Using Dendrograms to Create Prognostic Systems for Cancer sponsored by John P. Murtha Cancer Center Research Program. Arnold Schwartz was supported by the grant Prognostic Markers in Early Stage Lung Cancer: Computer Algorithms and Bayesian Regression?sponsored by the Dr. Cyrus and Myrtle Katzen Cancer Research Grant Award at The George Washington University. Disclaimer: The views expressed are those of the authors and do not necessarily reflect the official views of the Uniformed Services University of the Health Sciences, the Department of Defense, or the U.S. Government. Note: For those interested in the R code of the EACCD, contact the corresponding author.
Author information
Authors and Affiliations
Corresponding author
Additional information
This article is part of the Topical Collection on Mobile Systems
Rights and permissions
About this article
Cite this article
Chen, D., Wang, H., Sheng, L. et al. An Algorithm for Creating Prognostic Systems for Cancer. J Med Syst 40, 160 (2016). https://doi.org/10.1007/s10916-016-0518-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10916-016-0518-1