Abstract
Electronic Medical Records (EMRs) provide a wealth of data that can be used to generate predictive models for diseases. Quite some studies have been performed that use EMRs to generate such models for specific diseases, but most of them are based on more traditional techniques used in medical domain, such as logistic regression. This paper studies the benefit of using advanced data mining techniques for Colorectal Cancer (CRC). CRC is the second most common cancer in the EU and is known to be a disease with very a-specific predictors, making it difficult to generate good predictive models. In addition, the EMR data itself has its own challenges, including the sparsity, the differences in which physicians code the data, the temporal nature of the data, and the imbalance in the data. Results show that state-of-the-art data mining techniques, including temporal data mining, are able to generate better predictive models than currently available in the literature.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Srikant, R. et al.: Fast algorithms for mining association rules. In: Proc. 20th Int. Conf. Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)
Batal, I., Valizadegan, H., Cooper, G.F., Hauskrecht, M.: A temporal pattern mining approach for classifying electronic health record data. ACM Transactions on Intelligent Systems and Technology (TIST) 4(4), 63 (2013)
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Breiman, L., Friedman, J., Olshen, R., Stone, C., Steinberg, D., Colla, P.: Cart: Classification and regression trees, Wadsworth, Belmont, CA, p. 156 (1983)
Ferlay, J., Parkin, D., Steliarova-Foucher, E.: Estimates of cancer incidence and mortality in europe in 2008. European Journal of Cancer 46(4), 765–781 (2010)
Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology 143(1), 29–36 (1982)
Hippisley-Cox, J., Coupland, C.: Identifying patients with suspected colorectal cancer in primary care: derivation and validation of an algorithm. British Journal of General Practice 62(594), e29–e37 (2012)
Hoogendoorn, M., Moons, L.M.G., Numans, M.E., Sips, R.-J.: Utilizing data mining for predictive modeling of colorectal cancer using electronic medical records. In: Ślęzak, D., Tan, A.-H., Peters, J.F., Schwabe, L. (eds.) BIH 2014. LNCS, vol. 8609, pp. 132–141. Springer, Heidelberg (2014)
Koning, N., Moons, L., Buchner, F., ten Teije, A., Numans, M., Hesper, C.: Identification of patients at risk for colorectal cancer in primary care: An explorative study using routine health care data. In: NACPRG Annual Meeting. NACPRG (2014)
Lehman, L.-W., Saeed, M., Long, W., Lee, J., Mark, R.: Risk stratification of icu patients using topic models inferred from unstructured progress notes. In: AMIA Annual Symposium Proceedings, vol. 2012, p. 505. American Medical Informatics Association (2012)
Marshall, T., Lancashire, R., Sharp, D., Peters, T.J., Cheng, K.K., Hamilton, W.: The diagnostic performance of scoring systems to identify symptomatic colorectal cancer compared to current referral guidance. Gut 60(9), 1242–1248 (2011)
Patnaik, D., Butler, P., Ramakrishnan, N., Parida, L., Keller, B.J., Hanauer, D.A.: Experiences with mining temporal event sequences from electronic medical records: initial successes and some challenges. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 360–368. ACM (2011)
Tatonetti, N., Denny, J., Murphy, S., Fernald, G., Krishnan, G., Castro, V., Yue, P., Tsau, P., Kohane, I., Roden, D., et al.: Detecting drug interactions from adverse-event reports: Interaction between paroxetine and pravastatin increases blood glucose levels. Age (mean±SD) 63(10.1), 55–61
van der Linden, M., Wester, G., de Bakker, D., Schellevis, F.: (dutch) tweede nationale studie naar ziekten en verrichtingen in de huisartspraktijk: klachten en aandoeningen in de bevolking en in de huisartspraktijk. Huisarts en Wetenschap 46 (2004)
Vapnik, V., Kotz, S.: Estimation of dependences based on empirical data. Springer (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Kop, R., Hoogendoorn, M., Moons, L.M.G., Numans, M.E., ten Teije, A. (2015). On the Advantage of Using Dedicated Data Mining Techniques to Predict Colorectal Cancer. In: Holmes, J., Bellazzi, R., Sacchi, L., Peek, N. (eds) Artificial Intelligence in Medicine. AIME 2015. Lecture Notes in Computer Science(), vol 9105. Springer, Cham. https://doi.org/10.1007/978-3-319-19551-3_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-19551-3_16
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19550-6
Online ISBN: 978-3-319-19551-3
eBook Packages: Computer ScienceComputer Science (R0)