Skip to main content

On the Advantage of Using Dedicated Data Mining Techniques to Predict Colorectal Cancer

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9105))

Abstract

Electronic Medical Records (EMRs) provide a wealth of data that can be used to generate predictive models for diseases. Quite some studies have been performed that use EMRs to generate such models for specific diseases, but most of them are based on more traditional techniques used in medical domain, such as logistic regression. This paper studies the benefit of using advanced data mining techniques for Colorectal Cancer (CRC). CRC is the second most common cancer in the EU and is known to be a disease with very a-specific predictors, making it difficult to generate good predictive models. In addition, the EMR data itself has its own challenges, including the sparsity, the differences in which physicians code the data, the temporal nature of the data, and the imbalance in the data. Results show that state-of-the-art data mining techniques, including temporal data mining, are able to generate better predictive models than currently available in the literature.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Srikant, R. et al.: Fast algorithms for mining association rules. In: Proc. 20th Int. Conf. Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)

    Google Scholar 

  2. Batal, I., Valizadegan, H., Cooper, G.F., Hauskrecht, M.: A temporal pattern mining approach for classifying electronic health record data. ACM Transactions on Intelligent Systems and Technology (TIST) 4(4), 63 (2013)

    Google Scholar 

  3. Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  4. Breiman, L., Friedman, J., Olshen, R., Stone, C., Steinberg, D., Colla, P.: Cart: Classification and regression trees, Wadsworth, Belmont, CA, p. 156 (1983)

    Google Scholar 

  5. Ferlay, J., Parkin, D., Steliarova-Foucher, E.: Estimates of cancer incidence and mortality in europe in 2008. European Journal of Cancer 46(4), 765–781 (2010)

    Article  Google Scholar 

  6. Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology 143(1), 29–36 (1982)

    Article  Google Scholar 

  7. Hippisley-Cox, J., Coupland, C.: Identifying patients with suspected colorectal cancer in primary care: derivation and validation of an algorithm. British Journal of General Practice 62(594), e29–e37 (2012)

    Google Scholar 

  8. Hoogendoorn, M., Moons, L.M.G., Numans, M.E., Sips, R.-J.: Utilizing data mining for predictive modeling of colorectal cancer using electronic medical records. In: Ślęzak, D., Tan, A.-H., Peters, J.F., Schwabe, L. (eds.) BIH 2014. LNCS, vol. 8609, pp. 132–141. Springer, Heidelberg (2014)

    Google Scholar 

  9. Koning, N., Moons, L., Buchner, F., ten Teije, A., Numans, M., Hesper, C.: Identification of patients at risk for colorectal cancer in primary care: An explorative study using routine health care data. In: NACPRG Annual Meeting. NACPRG (2014)

    Google Scholar 

  10. Lehman, L.-W., Saeed, M., Long, W., Lee, J., Mark, R.: Risk stratification of icu patients using topic models inferred from unstructured progress notes. In: AMIA Annual Symposium Proceedings, vol. 2012, p. 505. American Medical Informatics Association (2012)

    Google Scholar 

  11. Marshall, T., Lancashire, R., Sharp, D., Peters, T.J., Cheng, K.K., Hamilton, W.: The diagnostic performance of scoring systems to identify symptomatic colorectal cancer compared to current referral guidance. Gut 60(9), 1242–1248 (2011)

    Article  Google Scholar 

  12. Patnaik, D., Butler, P., Ramakrishnan, N., Parida, L., Keller, B.J., Hanauer, D.A.: Experiences with mining temporal event sequences from electronic medical records: initial successes and some challenges. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 360–368. ACM (2011)

    Google Scholar 

  13. Tatonetti, N., Denny, J., Murphy, S., Fernald, G., Krishnan, G., Castro, V., Yue, P., Tsau, P., Kohane, I., Roden, D., et al.: Detecting drug interactions from adverse-event reports: Interaction between paroxetine and pravastatin increases blood glucose levels. Age (mean±SD) 63(10.1), 55–61

    Google Scholar 

  14. van der Linden, M., Wester, G., de Bakker, D., Schellevis, F.: (dutch) tweede nationale studie naar ziekten en verrichtingen in de huisartspraktijk: klachten en aandoeningen in de bevolking en in de huisartspraktijk. Huisarts en Wetenschap 46 (2004)

    Google Scholar 

  15. Vapnik, V., Kotz, S.: Estimation of dependences based on empirical data. Springer (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Reinier Kop .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Kop, R., Hoogendoorn, M., Moons, L.M.G., Numans, M.E., ten Teije, A. (2015). On the Advantage of Using Dedicated Data Mining Techniques to Predict Colorectal Cancer. In: Holmes, J., Bellazzi, R., Sacchi, L., Peek, N. (eds) Artificial Intelligence in Medicine. AIME 2015. Lecture Notes in Computer Science(), vol 9105. Springer, Cham. https://doi.org/10.1007/978-3-319-19551-3_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-19551-3_16

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-19550-6

  • Online ISBN: 978-3-319-19551-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics