Abstract
This research presents a methodology for health data analytics through a case study for modelling cancer patient records. Timeline-structured clinical data systems represent a new approach to the understanding of the relationship between clinical activity, disease pathologies and health outcomes. The novel Southampton Breast Cancer Data System contains episode and timeline-structured records onĀ >17,000 patients who have been treated in University Hospital Southampton and affiliated hospitals since the late 1970s. The system is under continuous development and validation. Modern data mining software and visual analytics tools permit new insights into temporally-structured clinical data. The challenges and outcomes of the application of such software-based systems to this complex data environment are reported here. The core data was anonymised and put through a series of pre-processing exercises to identify and exclude anomalous and erroneous data, before restructuring within a remote data warehouse. A range of approaches was tested on the resulting dataset including multi-dimensional modelling, sequential patterns mining and classification. Visual analytics software has enabled the comparison of survival times and surgical treatments. The systems tested proved to be powerful in identifying episode sequencing patterns which were consistent with real-world clinical outcomes. It is concluded that, subject to further refinement and selection, modern data mining techniques can be applied to large and heterogeneous clinical datasets to inform decision making.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bonadonna, G., Hortobagyi, G.N., Valagussa, P.: Textbook of Breast Cancer: A Clinical Guide to Therapy. CRC Press, Boca Raton (2006)
Devi, R.D.H., Deepika, P.: Performance comparison of various clustering techniques for diagnosis of breast cancer. In: IEEE International Conference on Computational Intelligence and Computing Research, pp. 1ā5 (2015)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10ā18 (2009)
Han, J.W., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Elsevier, New York (2011)
Hand, D.J., Smyth, P., Mannila, H.: Principles of Data Mining. MIT Press, Cambridge (2001)
Holzinger, A.: Trends in interactive knowledge discovery for personalized medicine: cognitive science meets machine learning. IEEE Intell. Inform. Bull. 15(1), 6ā14 (2014)
Hu, H., Correll, M., Kvecher, L., Osmond, M., Clark, J., et al.: DW4TR: a data warehouse for translational research. J. Biomed. Inform. 44(6), 1004ā1019 (2011)
Jerez-Aragones, J.M., Gomez-Ruiz, J.A., Ramos-Jimenez, G., et al.: A combined neural network and decision trees model for prognosis of breast cancer relapse. Artif. Intell. Med. 27(1), 45ā63 (2003)
Kimball, R., Ross, M.: The Data Warehouse Toolkit ā The Definitive Guide to Dimensional Modeling. Wiley, New York (2013)
Lee, Y.J., Mangasarian, O.L., Wolberg, W.H.: Survival-time classification of breast cancer patients. Comput. Optim. Appl. 25(1ā3), 151ā166 (2003)
Lu, J., Chen, W.R., Adjei, O., Keech, M.: Sequential patterns post-processing for structural relation patterns mining. Int. J. Data Warehouse. Min. 4(3), 71ā89. (2008). IGI Global, Hershey, Pennsylvania
Lu, J., Hales, A., Rew, D., Keech, M., Frƶhlingsdorf, C., Mills-Mullett, A., Wette, C.: Data mining techniques in health informatics: a case study from breast cancer research. In: Renda, M.E., Bursa, M., Holzinger, A., Khuri, S. (eds.) ITBAM 2015. LNCS, vol. 9267, pp. 56ā70. Springer, Cham (2015). doi:10.1007/978-3-319-22741-2_6
Lu, J., Hales, A., Rew, D., Keech, M.: Timeline and episode-structured clinical data: Pre-processing for data mining and analytics. In: 32nd IEEE International Conference on Data Engineering (ICDE) ā Workshop on Health Data Management and Mining, pp. 64ā67 (2016)
Mahajan, R., Shneiderman, B.: Visual and textual consistency checking tools for graphical user interfaces. IEEE Trans. Softw. Eng. 23(11), 722ā735 (1997)
Marr, B.: Big Data: Using Smart Big Data Analytics and Metrics to Make Better Decisions and Improve Performance. Wiley, Chichester (2015)
Martin, M.A., Meyricke, R., OāNeill, T., Roberts, S.: Mastectomy or breast conserving surgery? Factors affecting type of surgical treatment for breast cancer: A classification tree approach. BMC Cancer 6, 98 (2006)
National Information Board. Personalised Health and Care 2020 (2014). https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/384650/NIB_Report.pdf
NHS. Five year forward view (2014). http://www.england.nhs.uk/wp-content/uploads/2014/10/5yfv-web.pdf
Razavi, A.R., Gill, H., Ahlfeldt, H., Shahsavar, N.: Predicting metastasis in breast cancer: Comparing a decision tree with domain experts. J. Med. Syst. 31, 263ā273 (2007)
Reenskaug, T., Coplien, J.: The DCI architecture: A new vision of object-oriented programming (2009). http://www.artima.com/articles/dci_vision.html
Reps, J., Garibaldi, J.M., Aickelin, U., Soria, D., Gibson, J.E., Hubbard, R.B.: Discovering sequential patterns in a UK general practice database. In: IEEE-EMBS International Conference on Biomedical and Health Informatics, pp. 960ā963 (2012)
Rew, D.: Issues in professional practice: The clinical informatics revolution. Published by Association of Surgeons of Great Britain and Ireland (2015)
Stolba, N., Tjoa, A.: The relevance of data warehousing and data mining in the field of evidence-based medicine to support healthcare decision making. Int. J. Comput. Syst. Sci. Eng. 3(3), 143ā148 (2006)
Wyatt, J.: Plenary Talk: Five big challenges for big health data. In: 8th IMA Conference on Quantitative Modelling in the Management of Health and Social Care (2016)
Acknowledgements
This research project has been supported in part by a Southampton Solent Research Innovation and Knowledge Exchange (RIKE) award for āSolent Health Informatics Partnershipā (Project ID: 1326). The authors would like to thank Solent students who made some contribution to the work: in particular Chantel Biddle, Adam Kershaw and Alex Potter. We are also pleased to acknowledge the generous support of colleagues in the University Hospital Southampton Informatics Team, in particular Adrian Byrne and David Cable.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2017 Springer International Publishing AG
About this paper
Cite this paper
Lu, J., Hales, A., Rew, D. (2017). Modelling of Cancer Patient Records: A Structured Approach to Data Mining and Visual Analytics. In: Bursa, M., Holzinger, A., Renda, M., Khuri, S. (eds) Information Technology in Bio- and Medical Informatics. ITBAM 2017. Lecture Notes in Computer Science(), vol 10443. Springer, Cham. https://doi.org/10.1007/978-3-319-64265-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-64265-9_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64264-2
Online ISBN: 978-3-319-64265-9
eBook Packages: Computer ScienceComputer Science (R0)