Early-Stage Event Prediction for Longitudinal Data

Fard, Mahtab J.; Chawla, Sanjay; Reddy, Chandan K.

doi:10.1007/978-3-319-31753-3_12

Mahtab J. Fard¹⁹,
Sanjay Chawla^20,21 &
Chandan K. Reddy¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9651))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2733 Accesses
3 Citations

Abstract

Predicting event occurrence at an early stage in longitudinal studies is an important problem which has high practical value. As opposed to the standard classification and regression problems where a domain expert can provide the labels for the data in a reasonably short period of time, training data in such longitudinal studies must be obtained only by waiting for the occurrence of sufficient number of events. The main objective of this work is to predict the event occurrence in the future for a particular subject in the study using the data collected at the initial stages of a longitudinal study. In this paper, we propose a novel Early Stage Prediction (ESP) framework for building event prediction models which are trained at early stages of longitudinal studies. More specifically, we develop two probabilistic algorithms based on Naive Bayes and Tree-Augmented Naive Bayes (TAN), called ESP-NB and ESP-TAN, respectively, for early stage event prediction by modifying the posterior probability of event occurrence using different extrapolations that are based on Weibull and Lognormal distributions. The proposed framework is evaluated using a wide range of synthetic and real-world benchmark datasets. Our extensive set of experiments show that the proposed ESP framework is able to more accurately predict future event occurrences using only a limited amount of training data compared to the other alternative approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://cran.rproject.org/web/packages/survival/.

References

Bandyopadhyay, S., Wolfson, J., Vock, D.M., Vazquez-Benitez, G., Adomavicius, G., Elidrisi, M., Johnson, P.E., O’Connor, P.J.: Data mining for censored time-to-event data: a bayesian network model for predicting cardiovascular risk from electronic health record data. Data Min. Knowl. Disc. 29(4), 1033–1069 (2015)
Article MathSciNet Google Scholar
Bender, R., Augustin, T., Blettner, M.: Generating survival times to simulate Cox proportional hazards models. Stat. Med. 25, 1978–1979 (2006)
Article MathSciNet Google Scholar
Carroll, K.J.: On the use and utility of the Weibull model in the analysis of survival data. Control. Clin. Trials 24(6), 682–701 (2003)
Article Google Scholar
Dawber, T.R., Kannel, W.B., Lyell, L.P.: An approach to longitudinal studies in a community: the Framingham study. Ann. N.Y. Acad. Sci. 107(2), 539–556 (1963)
Article Google Scholar
Dimitriadou, E., Hornik, K., Leisch, F., Meyer, D., Weingessel, A., Leisch, M.F.: Package e1071. R Software package (2009). http://cran.rproject.org/web/packages/e1071/index.html
Donovan, M.J., Donovan, M.J., Hamann, S., Clayton, M., et al.: Systems pathology approach for the prediction of prostate cancer progression after radical prostatectomy. J. Clin. Oncol.: Off. J. Am. Soc. Clin. Oncol. 26(24), 3923–3929 (2008)
Article Google Scholar
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Mach. Learn. 29(2–3), 131–163 (1997)
Article MATH Google Scholar
Gordon, L., Plshen, R.: Tree-structured survival analysis. Cancer Treat Rep. 69(10), 1065–1074 (1985)
Google Scholar
Hosmer, D.W., Lemeshow, S.: Applied Survival Analysis: Regression Modeling of Time to Event Data. Wiley, New York (1999)
MATH Google Scholar
John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann Publishers Inc. (1995)
Google Scholar
Khan, F.M., Zubek, V.B.: Support vector regression for censored data (SVRc): a novel tool for survival analysis. In: 8th IEEE International Conference on Data Mining, pp. 863–868 (2008)
Google Scholar
Lavrac, N.: Selected techniques for data mining in medicine. Artif. Intell. Med. 16, 3–23 (1999)
Article Google Scholar
Lee, E.T., Wang, J.: Statistical Methods for Survival Data Analysis, vol. 476. Wiley, New York (2003)
Book MATH Google Scholar
Lucas, P.J.F., van der Gaag, L.C., Abu-Hanna, A.: Bayesian networks in biomedicine and health-care. Artif. Intell. Med. 30(3), 201–214 (2004)
Article Google Scholar
Reddy, C.K., Li, Y.: A review of clinical prediction models. In: Reddy, C.K., Aggarwal, C.C. (eds.) Healthcare Data Analytics. Chapman and Hall/CRC Press, Boca Raton (2015)
Google Scholar
Royston, P.: The lognormal distribution as a model for survival time in cancer, with an emphasis on prognostic factors. Stat. Neerl. 55(1), 89–104 (2001)
Article MathSciNet MATH Google Scholar
Segal, M.R.: Regression trees for censored data. Biometrics 44(1), 35–47 (1988)
Article MATH Google Scholar
Shiao, H.-T., Cherkassky, V.: Learning using privileged information (LUPI) for modeling survival data. In: 2014 International Joint Conference on Neural Networks (IJCNN), pp. 1042–1049, July 2014
Google Scholar
Štajduhar, I., Dalbelo-Bašić, B.: Uncensoring censored data for machine learning: a likelihood-based approach. Expert Syst. Appl. 39(8), 7226–7234 (2012)
Article Google Scholar
Wolfson, J., Bandyopadhyay, S., Elidrisi, M., Vazquez-Benitez, G., Vock, D.M., Musgrove, D., Adomavicius, G., Johnson, P.E., O’Connor, P.J.: A Naive Bayes machine learning approach to risk prediction using censored, time-to-event data. Stat. Med. 34(21), 2941–2957 (2015)
Article MathSciNet Google Scholar
Zupan, B., DemšAr, J., Kattan, M.W., Beck, J.R., Bratko, I.: Machine learning for survival analysis: a case study on recurrence of prostate cancer. Artif. Intell. Med. 20(1), 59–75 (2000)
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Science Foundation grants IIS-1527827 and IIS-1231742.

Author information

Authors and Affiliations

Computer Science Department, Wayne State University, Detroit, MI, 48202, USA
Mahtab J. Fard & Chandan K. Reddy
Qatar Computing Research Institute, HBKU, Ar-rayyan, Qatar
Sanjay Chawla
University of Sydney, Sydney, NSW, Australia
Sanjay Chawla

Authors

Mahtab J. Fard
View author publications
You can also search for this author in PubMed Google Scholar
Sanjay Chawla
View author publications
You can also search for this author in PubMed Google Scholar
Chandan K. Reddy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mahtab J. Fard .

Editor information

Editors and Affiliations

The University of Melbourne, Melbourne, Victoria, Australia
James Bailey
The University of Texas at Dallas, Richardson, Texas, USA
Latifur Khan
Osaka University, Osaka, Japan
Takashi Washio
University of Auckland, Auckland, New Zealand
Gill Dobbie
Shenzhen University, Shenzhen, China
Joshua Zhexue Huang
Massey University, Auckland, New Zealand
Ruili Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fard, M.J., Chawla, S., Reddy, C.K. (2016). Early-Stage Event Prediction for Longitudinal Data. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J., Wang, R. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2016. Lecture Notes in Computer Science(), vol 9651. Springer, Cham. https://doi.org/10.1007/978-3-319-31753-3_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-31753-3_12
Published: 12 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31752-6
Online ISBN: 978-3-319-31753-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics