Skip to main content
Log in

Interactive survival analysis with the OCDM system: From development to application

  • Published:
Information Systems Frontiers Aims and scope Submit manuscript

Abstract

Medical data mining is currently actively pursued in computer science and statistical research but not in medical practice. The reasons therefore lie in the difficulties of handling and statistically analyzing medical data. We have developed a system that allows practitioners in the field to interactively analyze their data without assistance of statisticians or data mining experts. In the course of this paper we will introduce data mining of medical data and show how this can be achieved for survival data. We will demonstrate how to solve common problems of interactive survival analysis by presenting the Online Clinical Data Mining (OCDM) system. Thereby the main focus is on similarity based queries, a new method to select similar cases based on their covariables and the influence of these on their survival.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. OnLine Analytical Processing—A hypothesis driven, dimensional approach to decision support (Kimball 1996).

  2. The terms knowledge discovery in databases (KDD) and data mining (DM) are used in accordance with Fayyad et al. (1996). When it comes to actual software systems we will use the terms data mining or knowledge discovery system variantly.

  3. In this paper we use the term data warehouse in a broader sense, as a storage system for a large number of information from different sources.

  4. A non-parametric method to estimate the conditional expectation of a random variable Y, given the value x of its covariate, by a locally weighted average of the observations Y i related to the vicinity of x which is moderated by a kernel function and a bandwidth, see Hastie et al. (2002), e.g.

References

  • Abe, H., Yokoi, H., Ohsaki, M., & Yamaguchi, T. (2007). Developing an integrated time-series data mining environment for medical data mining. In Data mining workshops, 2007 ICDM workshops 2007 seventh IEEE international conference (pp. 127–132).

  • Ahmad, I., & Ran, I. (2004). Data based bandwidth selection in kernel density estimation with parametric start via kernel contrasts. Journal of Nonparametric Statistics, 16(37), 841–877.

    Article  Google Scholar 

  • Black, N. (2003). Using clinical databases in practice. Basic Music Journal, 326(7379), 2–3.

    Google Scholar 

  • Brameier, M., & Banzhaf, W. (2001). A comparison of linear genetic programming and neural networks in medical data mining. IEEE Transactions on Evolutionary Computation, 5(1), 17–26.

    Article  Google Scholar 

  • Cherkassky, V. (2007). Learning from data, 2nd edn. New York: Wiley.

    Google Scholar 

  • Cios, K. J., & William, M. G. (2002). Uniqueness of medical data mining. Artificial Intelligence in Medicine, 26(1–2), 1–24.

    Article  Google Scholar 

  • Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society Series B (Methodological), 34(3), 187–220.

    Google Scholar 

  • Date, C. J. (2002). Introduction to database systems. Boston: Addison-Wesley Longman.

    Google Scholar 

  • Delen, D., Walker, G., & Kadam, A. (2005). Predicting breast cancer survivability: A comparison of three data mining methods. Artificial Intelligence in Medicine, 34(3), 113–127.

    Article  Google Scholar 

  • Dippon, J., Fritz, P., & Kohler, M. (2002). A statistical approach to case based reasoning, with application to breast cancer data. Computational Statistics & Data Analysis, 40(3), 579–602.

    Article  Google Scholar 

  • Dyreson, C., Grandi, F., Käfer, W., Kline, N., Lorentzos, N., Mitsopoulos, Y. et al. (1994). A consensus glossary of temporal database concepts. ACM SIGMOD Rec, 23(1), 52–64.

    Article  Google Scholar 

  • Eggebraaten, T. J., Tenner, J. W., & Dubbels, J. C. (2007). A health-care data model based on the hl7 reference information model. IBM Systems Journal, 46(1), 5–18.

    Google Scholar 

  • Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. Ai Magazine, 17, 37–54.

    Google Scholar 

  • Fung, G., Yu, S., Dehing-Oberije, C., Ruysscher, D. D., Lambin, P., Krishnan, S. et al. (2008). Privacy-preserving predictive models for lung cancer survival analisys. In Privacy-preserving workshop at the SIAM data mining conference 2008.

  • Ghannad-Rezaie, M., Soltanain-Zadeh, H., Siadat, M. R., & Elisevich, K. (2006). Medical data mining using particle swarm optimization for temporal lobe epilepsy. Evolutionary Computation, 2006 CEC 2006 IEEE Congress on pp. 761–768.

  • Györfi, L., Kohler, M., Krzyzak, A., & Walk, H. (2002). A distribution-free theory of nonparametric regression. New York: Springer.

    Google Scholar 

  • Han, J., & Kamber, M. (2001). Data mining. San Francisco: Morgan Kaufmann.

    Google Scholar 

  • Harkema, H., Setzer, A., Gaizauskas, R., Hepple, M., Power, R., & Rogers, J. (2005). Mining and modelling temporal clinical data. In Cox, S. (Ed.), Proceedings of the 4th UK e-Science all hands meeting. Nottingham, UK, available at: http://www.allhands.org.uk/2005/proceedings/.

  • Hastie, T. J., Tibshirani, R. J., & Friedman, J. H. (2002). The elements of statistical learning, corrected print. edn. New York: Springer.

    Google Scholar 

  • Hoover, D. R., & He, Y. (1994). Nonidentified responses in a proportional hazards setting. Biometrics, 50(1), 1–10.

    Article  Google Scholar 

  • Houston, A. L., Chen, H., Hubbard, S. M., Schatz, B. R., Ng, T. D., Sewell, R. R., et al. (1999). Medical data mining on the internet: Research on a cancer information system. Artificial Intelligence Review, 13(5–6), 437–466.

    Article  Google Scholar 

  • Inokuchi, A., Takeda, K., Inaoka, N., & Wakao, F. (2007). Medtakmi-cdi: Interactive knowledge discovery for clinical decision intelligence. IBM Systems Journal, 46(1), 115–133.

    Google Scholar 

  • Kimball, R. (1996). The data warehouse toolkit. New York: Wiley.

    Google Scholar 

  • Klein, J. P, & Moeschberger, M. L. (2005). Survival analysis, 2nd edn. New York: Springer.

    Google Scholar 

  • Kleinbaum, D. G., & Klein, M. (2005). Survival analysis, 2nd edn. New York: Springer.

    Google Scholar 

  • Lundin, J., Lundin, M., Isola, J., & Joensuu, H. (2003). Infopoints: A web-based system for individualised survival estimation in breast cancer. Basic Music Journal, 326(7379), 29

    Google Scholar 

  • McAullay, D., Williams, G., Chen, J., Jin, H., He, H., Sparks, R., et al. (2005). A delivery framework for health data mining and analytics. In ACSC ’05: Proceedings of the twenty-eighth Australasian conference on computer science (pp. 381–387). Darlinghurst: Australian Computer Society.

    Google Scholar 

  • Meinicke, P., Brodag, T., Fricke, W. F., & Waack, S. (2006). P-value based visualization of codon usage data. Algorithms for Molecular Biology, 1, 10.

    Article  Google Scholar 

  • Mullins, I. M., Siadaty, M. S., Lyman, J., Scully, K., Garrett, C. T., Miller W. G. et al. (2006). Data mining and clinical data repositories: Insights from a 667,000 patient data set. Computers in Biology and Medicine, 36(12), 1351–1377.

    Article  Google Scholar 

  • Ölund, G., Lindqvist, P., & Litton, J. E. (2007). Bims: An information management system for biobanking in the 21st century. IBM Systems Journal, 46(1), 171–182.

    Article  Google Scholar 

  • Pedersen, T. B., & Jensen, C. S. (1998). Research issues in clinical data warehousing. In SSDBM ’98: Proceedings of the 10th international conference on scientific and statistical database management, IEEE computer society (pp. 43–52). Washington, DC, USA.

  • Pedersen, T. B., & Jensen, C. S. (1999). Multidimensional data modeling for complex data. In ICDE ’99: Proceedings of the 15th international conference on data engineering, IEEE computer society (p. 336). Washington, DC, USA.

  • R Development Core Team (2008) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org, ISBN 3-900051-07-0.

  • Radespiel-Tröger, M., Rabenstein, T., Schneider, H. T., & Lausen, B. (2003). Comparison of tree-based methods for prognostic stratification of survival data. Artificial Intelligence in Medicine, 28(3), 323–341.

    Article  Google Scholar 

  • Russell, S. J., & Norvig, P. (2003). Artificial intelligence, 2nd edn. Englewood Cliffs: Prentice Hall.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sebastian Klenk.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Klenk, S., Dippon, J., Fritz, P. et al. Interactive survival analysis with the OCDM system: From development to application. Inf Syst Front 11, 391–403 (2009). https://doi.org/10.1007/s10796-009-9152-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10796-009-9152-5

Keywords

Navigation