skip to main content
10.1145/3233547.3233561acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
short-paper

Melanoma Risk Prediction with Structured Electronic Health Records

Published:15 August 2018Publication History

ABSTRACT

Melanoma is one of the fastest growing cancers in the world, and can affect patients earlier in life than most other cancers. Therefore, it is imperative to be able to identify patients at high risk for melanoma and enroll them in screening programs to detect the cancer early. In this study, we explore data from dermatology outpatients to build a risk model for the disease. Using millions of patient records with thousands of data points in each record, we show that we can build a melanoma risk model from real-world Electronic Health Record (EHR) data without any expert knowledge or manually engineered features. While other risk models for melanoma have been developed, this is the first to use routinely collected EHR data rather than expert features targeted specifically for melanoma. The random forest model achieves similar or better performance than these previous models (AUC 0.79, sensitivity 0.71, specificity 0.72), which allows larger populations of patients to get screened for melanoma risk without having to perform specialized and time-consuming data collection. Important features from the model can be extracted and studied, and features influencing a specific prediction can be explained to providers and patients. The process for building this model can be further refined to improve performance, as well as used for risk prediction of other diseases.

References

  1. Anne-Marie Audet, David Squires, and Michelle M. Doty. 2014. Where Are We on the Diffusion Curve? Trends and Drivers of Primary Care Physicians' Use of Health Information Technology. Health Services Research 49, 1 (2014), 347--360.Google ScholarGoogle ScholarCross RefCross Ref
  2. Lucio Bakos, Simeona Mastroeni, Renan Rangel Bonamigo, Franco Melchi, Paolo Pasquini, Cristina Fortes, Lucio Bakos, Simeona Mastroeni, Renan Rangel Bonamigo, Franco Melchi, Paolo Pasquini, and Cristina Fortes. 2013. A melanoma risk score in a Brazilian population. Anais Brasileiros de Dermatologia 88, 2 (April 2013), 226--232.Google ScholarGoogle ScholarCross RefCross Ref
  3. Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5-- 32. http://machinelearning202.pbworks.com/w/file/fetch/60606349/breiman_randomforests.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321--357. http://www.jair.org/papers/paper953. html Google ScholarGoogle ScholarCross RefCross Ref
  5. Esther Erdei and Salina M Torres. 2010. A new understanding in the epidemiology of melanoma. Expert Review of Anticancer Therapy 10, 11 (2010), 1811--1823.Google ScholarGoogle ScholarCross RefCross Ref
  6. Thomas R. Fears, DuPont Guerry, Ruth M. Pfeiffer, Richard W. Sagebiel, David E. Elder, Allan Halpern, Elizabeth A. Holly, Patricia Hartge, and Margaret A. Tucker. 2006. Identifying Individuals at High Risk of Melanoma: A Practical Predictor of Absolute Risk. Journal of Clinical Oncology 24, 22 (Aug. 2006), 3590--3596.Google ScholarGoogle ScholarCross RefCross Ref
  7. Cristina Fortes, Simona Mastroeni, Lucio Bakos, Gianluca Antonelli, Livia Alessandroni, Maria Antonietta Pilla, Massimo Alotto, Alba Zappal, Thomas Manoorannparampill, Renan Bonamigo, Paolo Pasquini, and Franco Melchi. 2010. Identifying individuals at high risk of melanoma: a simple tool. European Journal of Cancer Prevention 19, 5 (Sept. 2010), 393--400.Google ScholarGoogle ScholarCross RefCross Ref
  8. Benjamin A. Goldstein, Ann Marie Navar, and Michael J. Pencina. 2016. Risk Prediction With Electronic Health Records. JAMA cardiology 1, 9 (Dec. 2016), 976--977.Google ScholarGoogle Scholar
  9. Alan N. Houghton and David Polsky. 2002. Focus on melanoma. Cancer Cell 2, 4 (2002), 275--278.Google ScholarGoogle ScholarCross RefCross Ref
  10. Chamelli Jhappan, Frances P Noonan, and Glenn Merlino. 2003. Ultraviolet radiation and cutaneous malignant melanoma. Oncogene 22, 20 (2003), 3099--3112.Google ScholarGoogle ScholarCross RefCross Ref
  11. Eric Jones, Travis Oliphant, and Pearu Peterson. 2014. SciPy: open source scientific tools for Python. (2014).Google ScholarGoogle Scholar
  12. Sara Landset, Taghi M Khoshgoftaar, Aaron N Richter, and Tawfiq Hasanin. 2015. A survey of open source tools for machine learning with big data in the Hadoop ecosystem. Journal of Big Data 2, 1 (2015), 24.Google ScholarGoogle ScholarCross RefCross Ref
  13. Vinayak K. Nahar, M. Allison Ford, Robert T. Brodell, Javier F. Boyas, Stephanie K. Jacks, Rizwana Biviji-Sharma, Mary A. Haskins, and Martha A. Bass. 2016. Skin cancer prevention practices among malignant melanoma survivors: a systematic review. Journal of Cancer Research and Clinical Oncology 142, 6 (2016), 1273--1283.Google ScholarGoogle ScholarCross RefCross Ref
  14. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Aaron N Richter and Taghi M Khoshgoftaar. 2017. Modernizing Analytics for Melanoma with a Large-Scale Research Dataset. In Information Reuse and Integration (IRI), 2017 IEEE 18th International Conference on. IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  16. Ando Saabas. 2015. TreeInterpreter. https://github.com/andosa/treeinterpreter.Google ScholarGoogle Scholar
  17. American Cancer Society. 2018. Cancer Facts & Figures 2018.Google ScholarGoogle Scholar
  18. Wolff T, Tai E, and Miller T. 2009. Screening for skin cancer: An update of the evidence for the u.s. preventive services task force. Annals of Internal Medicine 150, 3 (2009), 194--198. arXiv:/data/journals/aim/20175/0000605--200902030-00009.pdfGoogle ScholarGoogle ScholarCross RefCross Ref
  19. J. A. Usher-Smith, J. Emery, A. P. Kassianos, and F. M. Walter. 2014. Risk Prediction Models for Melanoma: A Systematic Review. Cancer Epidemiology Biomarkers & Prevention 23, 8 (2014), 1450--1463.Google ScholarGoogle ScholarCross RefCross Ref
  20. Jason Van Hulse, Taghi M. Khoshgoftaar, and Amri Napolitano. 2007. Experimental perspectives on learning from imbalanced data. In Proceedings of the 24th international conference on Machine learning. ACM, 935--942. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C.G. Watts, M. Dieng, R.L. Morton, G.J. Mann, S.W. Menzies, and A.E. Cust. 2015. Clinical practice guidelines for identification, screening and follow-up of individuals at high risk of primary cutaneous melanoma: a systematic review. British Journal of Dermatology 172, 1 (Jan. 2015), 33--47.Google ScholarGoogle ScholarCross RefCross Ref
  22. Lisa H. Williams, Andrew R. Shors, William E. Barlow, Cam Solomon, and Emily White. 2011. Identifying Persons at Highest Risk of Melanoma Using Self-Assessed Risk Factors. Journal of clinical & experimental dermatology research 2, 6 (2011).Google ScholarGoogle Scholar
  23. Matei Zaharia, Reynold S Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J Franklin, et al. 2016. Apache spark: a unified engine for big data processing. Commun. ACM 59, 11 (2016), 56--65. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Melanoma Risk Prediction with Structured Electronic Health Records

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        BCB '18: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics
        August 2018
        727 pages
        ISBN:9781450357944
        DOI:10.1145/3233547

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 15 August 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper

        Acceptance Rates

        BCB '18 Paper Acceptance Rate46of148submissions,31%Overall Acceptance Rate254of885submissions,29%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader