Abstract
Traditional data mining algorithms assume that all data on a given object becomes available simultaneously (e.g., by accessing the object record in a database). However, certain real-world applications, known as survival analysis, or event history analysis (EHA), deal with monitoring specific objects, such as medical patients, in the course of their lifetime. The data streams produced by such applications contain various events related to the monitored objects. When we observe an infinite stream of events, at each point in time (the “cut-off point”), some of the monitored entities are “right-censored”, since they have not experienced the event of interest yet and we do not know when the event will occur in the future. In snapshot monitoring, the data stream is observed as a sequence of periodic snapshots. Given each snapshot, we are interested to estimate the probability of a critical event (e.g., patient death or equipment failure) as a function of time for every monitored object. In this research, we use fuzzy class label adjustment so that standard classification algorithms can seamlessly handle a snapshot stream of both censored and non-censored data. The objective is to provide reasonably accurate predictions after observing relatively few snapshots of the data stream and to improve the classification performance with additional information obtained from each incoming snapshot. The proposed fuzzy-based methodology is evaluated on real-world snapshot streams from two different domains of survival analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Krempl, G., Žliobaite, I., Brzeziński, D., Hüllermeier, E., Last, M., Lemaire, V., Noack, T., Shaker, A., Sievi, S., Spiliopoulou, M., Stefanowski, J.: Open challenges for data stream mining research. ACM SIGKDD Explor. Newsl. 16(1), 1–10 (2014)
Pizzi, N., Pedrycz, W.: Fuzzy set theoretic adjustment to training set class labels using robust location measures. In: Proceedings of the IEEE-INNS-ENNS International Joint Conference on, pp.109, 112 vol. 3, 2000 (2000)
Moeschberger, M.L., Klein, J.P.: Examples of survival data. In: Survival Analysis: Techniques for Censored and Truncated Data, 2nd edn, pp. 1–20. Springer, Berlin (2003)
Fleming, T., Lin, D.: Survival analysis in clinical trials: past developments and future directions. Biometrics 56(4), 971–983 (2000)
Kaplan, E.L., Meier, P.: Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc. 53(282), 457–481 (1958)
Costella, J.: A simple alternative to Kaplan–Meier for survival curves. Peter MacCallum Cancer Centre Working Paper No (2010)
Last, M., Zhmudyak, A., Halpert, H., Chakrabarty, S.: Multi-dimensional failure probability estimation in automotive industry based on censored warranty data. In: Synergies of Soft Computing and Statistics for Intelligent Data Analysis. Berlin/Heidelberg (2013)
Cox, D.R.: Regression models and life-tables. J. Roy. Stat. Soc. 34(2), 187–220 (1972)
Segal, M.: Regression trees for censored data. Biometrics 44(1), 35–47 (1988)
Zupan, B., Demsar, J., Kattan, M.W., Beck, R., Bratko, I.: Machine learning for survival analysis: a case study on recurrence of prostate cancer. Artif. Intell. Med. 20(1), 59–75 (2000)
Shaker, A., Hullermeier, E.: Event history analysis on data streams. Int. J. Appl. Math. Comput. Sci. (to appear)
Last, M., Halpert, H.: Survival analysis meets data stream mining. In: First Workshop on Real-World Challenges for Data Stream Mining (RealStream 2013) (2013)
Rueping, S.: SVM classifier estimation from group probabilities. In: International Conference on Machine Learning, Haifa, Israel (2010)
Hernández, J., Inza, I.: Learning naive Bayes models for multiple-instance learning with label proportions. In: Advances in Artificial Intelligence, pp. 134–144 (2011)
Provost, F., Domingos, P.: Tree Induction for Probability-Based Ranking. Mach. Learn. 52(3), 199–215 (2003)
Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 1601–1604 (2010)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Wayne, I., Pat, L.: Induction of one-level decision trees. In: ML. Aberdeen, Scotland (1992)
Freund, Y., Schapire, R., Abe, N.: A short introduction to boosting. J. Japan. Soc. Artif. Intell. 14, 771–780 (1999)
Acknowledgments.
This work was supported in part by the General Motors Global Research & Development - India Science Lab.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Last, M., Halpert, H. (2015). A Fuzzy-Based Approach to Survival Data Mining. In: Tamir, D., Rishe, N., Kandel, A. (eds) Fifty Years of Fuzzy Logic and its Applications. Studies in Fuzziness and Soft Computing, vol 326. Springer, Cham. https://doi.org/10.1007/978-3-319-19683-1_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-19683-1_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19682-4
Online ISBN: 978-3-319-19683-1
eBook Packages: EngineeringEngineering (R0)