ABSTRACT
Melanoma is one of the fastest growing cancers in the world, and can affect patients earlier in life than most other cancers. Therefore, it is imperative to be able to identify patients at high risk for melanoma and enroll them in screening programs to detect the cancer early. In this study, we explore data from dermatology outpatients to build a risk model for the disease. Using millions of patient records with thousands of data points in each record, we show that we can build a melanoma risk model from real-world Electronic Health Record (EHR) data without any expert knowledge or manually engineered features. While other risk models for melanoma have been developed, this is the first to use routinely collected EHR data rather than expert features targeted specifically for melanoma. The random forest model achieves similar or better performance than these previous models (AUC 0.79, sensitivity 0.71, specificity 0.72), which allows larger populations of patients to get screened for melanoma risk without having to perform specialized and time-consuming data collection. Important features from the model can be extracted and studied, and features influencing a specific prediction can be explained to providers and patients. The process for building this model can be further refined to improve performance, as well as used for risk prediction of other diseases.
- Anne-Marie Audet, David Squires, and Michelle M. Doty. 2014. Where Are We on the Diffusion Curve? Trends and Drivers of Primary Care Physicians' Use of Health Information Technology. Health Services Research 49, 1 (2014), 347--360.Google ScholarCross Ref
- Lucio Bakos, Simeona Mastroeni, Renan Rangel Bonamigo, Franco Melchi, Paolo Pasquini, Cristina Fortes, Lucio Bakos, Simeona Mastroeni, Renan Rangel Bonamigo, Franco Melchi, Paolo Pasquini, and Cristina Fortes. 2013. A melanoma risk score in a Brazilian population. Anais Brasileiros de Dermatologia 88, 2 (April 2013), 226--232.Google ScholarCross Ref
- Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5-- 32. http://machinelearning202.pbworks.com/w/file/fetch/60606349/breiman_randomforests.pdf Google ScholarDigital Library
- Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321--357. http://www.jair.org/papers/paper953. html Google ScholarCross Ref
- Esther Erdei and Salina M Torres. 2010. A new understanding in the epidemiology of melanoma. Expert Review of Anticancer Therapy 10, 11 (2010), 1811--1823.Google ScholarCross Ref
- Thomas R. Fears, DuPont Guerry, Ruth M. Pfeiffer, Richard W. Sagebiel, David E. Elder, Allan Halpern, Elizabeth A. Holly, Patricia Hartge, and Margaret A. Tucker. 2006. Identifying Individuals at High Risk of Melanoma: A Practical Predictor of Absolute Risk. Journal of Clinical Oncology 24, 22 (Aug. 2006), 3590--3596.Google ScholarCross Ref
- Cristina Fortes, Simona Mastroeni, Lucio Bakos, Gianluca Antonelli, Livia Alessandroni, Maria Antonietta Pilla, Massimo Alotto, Alba Zappal, Thomas Manoorannparampill, Renan Bonamigo, Paolo Pasquini, and Franco Melchi. 2010. Identifying individuals at high risk of melanoma: a simple tool. European Journal of Cancer Prevention 19, 5 (Sept. 2010), 393--400.Google ScholarCross Ref
- Benjamin A. Goldstein, Ann Marie Navar, and Michael J. Pencina. 2016. Risk Prediction With Electronic Health Records. JAMA cardiology 1, 9 (Dec. 2016), 976--977.Google Scholar
- Alan N. Houghton and David Polsky. 2002. Focus on melanoma. Cancer Cell 2, 4 (2002), 275--278.Google ScholarCross Ref
- Chamelli Jhappan, Frances P Noonan, and Glenn Merlino. 2003. Ultraviolet radiation and cutaneous malignant melanoma. Oncogene 22, 20 (2003), 3099--3112.Google ScholarCross Ref
- Eric Jones, Travis Oliphant, and Pearu Peterson. 2014. SciPy: open source scientific tools for Python. (2014).Google Scholar
- Sara Landset, Taghi M Khoshgoftaar, Aaron N Richter, and Tawfiq Hasanin. 2015. A survey of open source tools for machine learning with big data in the Hadoop ecosystem. Journal of Big Data 2, 1 (2015), 24.Google ScholarCross Ref
- Vinayak K. Nahar, M. Allison Ford, Robert T. Brodell, Javier F. Boyas, Stephanie K. Jacks, Rizwana Biviji-Sharma, Mary A. Haskins, and Martha A. Bass. 2016. Skin cancer prevention practices among malignant melanoma survivors: a systematic review. Journal of Cancer Research and Clinical Oncology 142, 6 (2016), 1273--1283.Google ScholarCross Ref
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830. Google ScholarDigital Library
- Aaron N Richter and Taghi M Khoshgoftaar. 2017. Modernizing Analytics for Melanoma with a Large-Scale Research Dataset. In Information Reuse and Integration (IRI), 2017 IEEE 18th International Conference on. IEEE.Google ScholarCross Ref
- Ando Saabas. 2015. TreeInterpreter. https://github.com/andosa/treeinterpreter.Google Scholar
- American Cancer Society. 2018. Cancer Facts & Figures 2018.Google Scholar
- Wolff T, Tai E, and Miller T. 2009. Screening for skin cancer: An update of the evidence for the u.s. preventive services task force. Annals of Internal Medicine 150, 3 (2009), 194--198. arXiv:/data/journals/aim/20175/0000605--200902030-00009.pdfGoogle ScholarCross Ref
- J. A. Usher-Smith, J. Emery, A. P. Kassianos, and F. M. Walter. 2014. Risk Prediction Models for Melanoma: A Systematic Review. Cancer Epidemiology Biomarkers & Prevention 23, 8 (2014), 1450--1463.Google ScholarCross Ref
- Jason Van Hulse, Taghi M. Khoshgoftaar, and Amri Napolitano. 2007. Experimental perspectives on learning from imbalanced data. In Proceedings of the 24th international conference on Machine learning. ACM, 935--942. Google ScholarDigital Library
- C.G. Watts, M. Dieng, R.L. Morton, G.J. Mann, S.W. Menzies, and A.E. Cust. 2015. Clinical practice guidelines for identification, screening and follow-up of individuals at high risk of primary cutaneous melanoma: a systematic review. British Journal of Dermatology 172, 1 (Jan. 2015), 33--47.Google ScholarCross Ref
- Lisa H. Williams, Andrew R. Shors, William E. Barlow, Cam Solomon, and Emily White. 2011. Identifying Persons at Highest Risk of Melanoma Using Self-Assessed Risk Factors. Journal of clinical & experimental dermatology research 2, 6 (2011).Google Scholar
- Matei Zaharia, Reynold S Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J Franklin, et al. 2016. Apache spark: a unified engine for big data processing. Commun. ACM 59, 11 (2016), 56--65. Google ScholarDigital Library
Index Terms
- Melanoma Risk Prediction with Structured Electronic Health Records
Recommendations
Image Classification of Skin Cancer: Using Deep Learning as a Tool for Skin Self-examinations
Mathematical and Computational OncologyAbstractSkin cancer is the most common cancer in the United States, and studies indicate that its incidence is rapidly increasing. Regular skin self-examinations enable early cancer detection and intervention and are recommended in addition to clinician-...
MelaNet: an effective deep learning framework for melanoma detection using dermoscopic images
AbstractSkin cancer is considered one of the most dangerous and popular sorts of cancer. The deadliest form of this type of cancer is called melanoma, it happens while pigmented cells named melanocytes begin to subdivide tensely. If early detected ...
Skin Cancer Classification Using Different Backbones of Convolutional Neural Networks
Advances and Trends in Artificial Intelligence. Theory and Practices in Artificial IntelligenceAbstractMelanoma is the deadliest of many different types of skin cancer. Clinical screening is followed by dermoscopic analysis and histopathological examination in the diagnosis of melanoma. Melanoma is a type of skin cancer that is highly curable if ...
Comments