Skip to main content

Early-Stage Event Prediction for Longitudinal Data

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9651))

Included in the following conference series:

Abstract

Predicting event occurrence at an early stage in longitudinal studies is an important problem which has high practical value. As opposed to the standard classification and regression problems where a domain expert can provide the labels for the data in a reasonably short period of time, training data in such longitudinal studies must be obtained only by waiting for the occurrence of sufficient number of events. The main objective of this work is to predict the event occurrence in the future for a particular subject in the study using the data collected at the initial stages of a longitudinal study. In this paper, we propose a novel Early Stage Prediction (ESP) framework for building event prediction models which are trained at early stages of longitudinal studies. More specifically, we develop two probabilistic algorithms based on Naive Bayes and Tree-Augmented Naive Bayes (TAN), called ESP-NB and ESP-TAN, respectively, for early stage event prediction by modifying the posterior probability of event occurrence using different extrapolations that are based on Weibull and Lognormal distributions. The proposed framework is evaluated using a wide range of synthetic and real-world benchmark datasets. Our extensive set of experiments show that the proposed ESP framework is able to more accurately predict future event occurrences using only a limited amount of training data compared to the other alternative approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://cran.rproject.org/web/packages/survival/.

References

  1. Bandyopadhyay, S., Wolfson, J., Vock, D.M., Vazquez-Benitez, G., Adomavicius, G., Elidrisi, M., Johnson, P.E., O’Connor, P.J.: Data mining for censored time-to-event data: a bayesian network model for predicting cardiovascular risk from electronic health record data. Data Min. Knowl. Disc. 29(4), 1033–1069 (2015)

    Article  MathSciNet  Google Scholar 

  2. Bender, R., Augustin, T., Blettner, M.: Generating survival times to simulate Cox proportional hazards models. Stat. Med. 25, 1978–1979 (2006)

    Article  MathSciNet  Google Scholar 

  3. Carroll, K.J.: On the use and utility of the Weibull model in the analysis of survival data. Control. Clin. Trials 24(6), 682–701 (2003)

    Article  Google Scholar 

  4. Dawber, T.R., Kannel, W.B., Lyell, L.P.: An approach to longitudinal studies in a community: the Framingham study. Ann. N.Y. Acad. Sci. 107(2), 539–556 (1963)

    Article  Google Scholar 

  5. Dimitriadou, E., Hornik, K., Leisch, F., Meyer, D., Weingessel, A., Leisch, M.F.: Package e1071. R Software package (2009). http://cran.rproject.org/web/packages/e1071/index.html

  6. Donovan, M.J., Donovan, M.J., Hamann, S., Clayton, M., et al.: Systems pathology approach for the prediction of prostate cancer progression after radical prostatectomy. J. Clin. Oncol.: Off. J. Am. Soc. Clin. Oncol. 26(24), 3923–3929 (2008)

    Article  Google Scholar 

  7. Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Mach. Learn. 29(2–3), 131–163 (1997)

    Article  MATH  Google Scholar 

  8. Gordon, L., Plshen, R.: Tree-structured survival analysis. Cancer Treat Rep. 69(10), 1065–1074 (1985)

    Google Scholar 

  9. Hosmer, D.W., Lemeshow, S.: Applied Survival Analysis: Regression Modeling of Time to Event Data. Wiley, New York (1999)

    MATH  Google Scholar 

  10. John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann Publishers Inc. (1995)

    Google Scholar 

  11. Khan, F.M., Zubek, V.B.: Support vector regression for censored data (SVRc): a novel tool for survival analysis. In: 8th IEEE International Conference on Data Mining, pp. 863–868 (2008)

    Google Scholar 

  12. Lavrac, N.: Selected techniques for data mining in medicine. Artif. Intell. Med. 16, 3–23 (1999)

    Article  Google Scholar 

  13. Lee, E.T., Wang, J.: Statistical Methods for Survival Data Analysis, vol. 476. Wiley, New York (2003)

    Book  MATH  Google Scholar 

  14. Lucas, P.J.F., van der Gaag, L.C., Abu-Hanna, A.: Bayesian networks in biomedicine and health-care. Artif. Intell. Med. 30(3), 201–214 (2004)

    Article  Google Scholar 

  15. Reddy, C.K., Li, Y.: A review of clinical prediction models. In: Reddy, C.K., Aggarwal, C.C. (eds.) Healthcare Data Analytics. Chapman and Hall/CRC Press, Boca Raton (2015)

    Google Scholar 

  16. Royston, P.: The lognormal distribution as a model for survival time in cancer, with an emphasis on prognostic factors. Stat. Neerl. 55(1), 89–104 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  17. Segal, M.R.: Regression trees for censored data. Biometrics 44(1), 35–47 (1988)

    Article  MATH  Google Scholar 

  18. Shiao, H.-T., Cherkassky, V.: Learning using privileged information (LUPI) for modeling survival data. In: 2014 International Joint Conference on Neural Networks (IJCNN), pp. 1042–1049, July 2014

    Google Scholar 

  19. Štajduhar, I., Dalbelo-Bašić, B.: Uncensoring censored data for machine learning: a likelihood-based approach. Expert Syst. Appl. 39(8), 7226–7234 (2012)

    Article  Google Scholar 

  20. Wolfson, J., Bandyopadhyay, S., Elidrisi, M., Vazquez-Benitez, G., Vock, D.M., Musgrove, D., Adomavicius, G., Johnson, P.E., O’Connor, P.J.: A Naive Bayes machine learning approach to risk prediction using censored, time-to-event data. Stat. Med. 34(21), 2941–2957 (2015)

    Article  MathSciNet  Google Scholar 

  21. Zupan, B., DemšAr, J., Kattan, M.W., Beck, J.R., Bratko, I.: Machine learning for survival analysis: a case study on recurrence of prostate cancer. Artif. Intell. Med. 20(1), 59–75 (2000)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Science Foundation grants IIS-1527827 and IIS-1231742.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mahtab J. Fard .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Fard, M.J., Chawla, S., Reddy, C.K. (2016). Early-Stage Event Prediction for Longitudinal Data. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J., Wang, R. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2016. Lecture Notes in Computer Science(), vol 9651. Springer, Cham. https://doi.org/10.1007/978-3-319-31753-3_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-31753-3_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-31752-6

  • Online ISBN: 978-3-319-31753-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics