Abstract
In the preprocessing step of a knowledge discovery process, the method of discretization selected can have a remarkable impact on the performance and accuracy of classification algorithms. In this article, we analyze and compare expert discretization and automatic discretization algorithms. In particular, we study their impact to predict the survival of patients in the context of intensive care burn units. We focus on the quality of different discretizations algorithm analyzing the number of intervals generated, the amount of patterns produced and the classification performance in a specific clinical problem. Our results show that the many algorithms underperform expert discretization and that it is necessary to take into account the correlation among continuous features to obtain the best accuracy.
Similar content being viewed by others
References
Agrawal, R., Srikant, R.: Mining sequential patterns. In: International Conference on Data Engineering, March 6–10, 1995, Taipei, Taiwan (1995)
Alcalá-Fdez, J., Sánchez, L., García, S., del Jesus, M.J., Ventura, S., Garrell, J.M., Otero, J., Romero, C., Bacardit, J., Rivas, V.M., Fernández, J.C., Herrera, F.: KEEL: a software tool to assess evolutionary algorithms to data mining problems. Soft Comput. 13(3), 307–318 (2009)
Azulay, R. et al.: Discretization of medical time series—A comparative study. In: Proceedings of the IDAMAP 2007, Amsterdam, The Netherlands, (2007)
Casanova, I.J., Campos, M., Juarez, J.M., Fernandez-Fernandez-Arroyo, A., Lorente, J.A.: Using multivariate sequential patterns to improve survival prediction in Intensive Care Burn Unit. In: Proceedings of the 15th Conference on Artificial Intelligence in Medicine, AIME 2015, pp. 277–286. Pavia, Italy (2015)
Casanova, I.J., Campos, M., Juarez, J.M., Fernandez-Fernandez-Arroyo, A., Lorente, J.A.: Impact of discretization with multivariate sequential patterns to do the classification of the survival prediction in Intensive Care Burn Unit. In: Proceedings of the VIII Simposio Teoría y Aplicaciones de Minería de Datos (TAMIDA 2016). CAEPIA 2016, pages 847–856. Salamanca, Spain (2016)
Cios, K.J., Pedrycz, W., Swiniarski, R.W., Kurgan, L.: Data Mining: A Knowledge Discovery Approach. Springer Science & Business Media, Berlin (2007)
Clarke, E.J., Barton, B.A.: Entropy and MDL discretization of continuous variables for Bayesian belief networks. Int. J. Intell. Syst. 15, 61–92 (2000)
Cohen, W.W.: Fast effective rule induction. In: Proceedings of the 20th International Conference on Machine Learning, pp. 115–123. Morgan Kaufmann, (1995)
Demsar, J., Zupan, B., Aoki, N., et al.: Feature mining and predictive model construction from severe trauma patient’s data. Int. J. Med. Inform. 63, 41–50 (2012)
Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: XIII International Joint Conference on Artificial Intelligence (IJCAI93), Chambery, France, pp. 1022–1029, (1993)
Ferreira, A.J.: Feature selection and discretization for high-dimensional data. Ph.D. Thesis, Universidade de Lisboa, (2014)
Garcia, S., Luengo, J., Saez, J.A., Lopez, V., Herrera, F.: A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans. Knowl. Data Eng. 25(4), 734–750 (2013)
Gomariz, A.: Techniques for the discovery of temporal patterns. Ph.D. Thesis, University of Murcia (Spain), University of Antwerp (Belgium), (2013)
Hoppner, F.: Time series abstraction methods—A survey in workshop on knowledge discovery in databases, Dortmund, (2002)
Jimenez, F., Sanchez, G., Juarez, J.M.: Multi-objective evolutionary algorithms for fuzzy classification in survival prediction. Artif. Intell. Med. 60, 197–219 (2014)
Kerber, R.: ChiMerge: discretization of numeric attributes. In: Proceedings of 10th International Artificial Intelligence, pp. 123–128, (1992)
Kotsiantis, S., Kanellopoulos, D.: Discretization techniques: a recent survey. GESTS Int. Trans. Comput. Sci. Eng. 32(1), 47–58 (2006)
Lee, C.: A Hellinger-based discretization method for numeric attributes in classification learning. Knowl. Based Syst. 20(4), 419–425 (2007)
Lima, M.D.C., et al.: Heuristic discretization method for bayesian networks. J. Comput. Sci. 10(5), 869–878 (2014)
Lin, J., Keogh, E., Lonardi, S., Chiu, B.: A symbolic representation of time series with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD DMKD workshop, (2003)
Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: an enabling technique. Data Min. Knowl. Discov. 6(4), 393–423 (2002)
Liu, X.: A discretization algorithm based on a heterogeneity criterion. IEEE Trans. Knowl. Data Eng. 17(9), 1166–1173 (2005)
Maslove, D.M., Podchiyska, T., Lowe, H.J.: Discretization of continuous features in clinical datasets. J. Am. Med. Inform. Assoc. 20(3), 544–553 (2013)
Mehta, S., Parthasarathy, S., Yang, H.: Toward unsupervised correlation preserving discretization. IEEE Trans. Knowl. Data Eng. 17(9), 1174–1185 (2005)
Mörchen, F., Ultsch, A.: Optimizing time series discretization for knowledge discovery. In: Proceedings of the KDD05 (2005)
Moskovitch, R., Shahar, Y.: Classification-driven temporal discretization of multivariate time series. Data Min. Knowl. Discov. 29(4), 871–913 (2015)
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Ridzuan, N., Wolfe, D.: Human Readable Rule Induction in Medical Data Mining: A Survey of Existing Algorithms Proceedings of the European Computing Conference, Lecture Notes in Electrical Engineering, Volume 27, pp. 787–798 (2009)
Ruiz, F.J., Angulo, C., Agell, N.: IDD: a supervised interval distance-based method for discretization. IEEE Trans. Knowl. Data Eng. 20(9), 1230–1238 (2008)
Shahar, Y.: A framework for knowledge-based temporal abstraction. Artif. Intell. 90(1—-2), 79–133 (1997)
Sheppard, N.N., Hemington-Gorse, S., Shelley, O.P., Philp, B., Dziewulski, P.: Prognostic scoring systems in burns: a review. Burns 37(8), 1288–1295 (2011)
Stacey, M., McGregor, C.: Temporal abstraction in intelligent clinical data analysis: a survey. Artif. Intell. Med. 39, 1–24 (2007)
Sun, C.-T., Hsu, J.H.: An extended Chi2 algorithm for discretization of real value attributes. IEEE Trans. Knowl. Data Eng. 17(3), 437–441 (2005)
Wu, Q.X., Bell, D.A., Prasad, G., McGinnity, T.M.: A distribution-index-based discretizer for decision-making with symbolic AI approaches. IEEE Trans. Knowl. Data Eng. 19(1), 17–28 (2007)
Zighed, D.A., Rabaseda, R., Rakotomalala, R.: FUSINTER: a method for discretization of continuous attributes. Int. J. Uncertain. Fuzz. Knowl.-Based Syst. 6(3), 307–326 (1998)
Acknowledgements
This work was partially funded by the Spanish Ministry of Economy and Competitiveness under project TIN2013-45491-R, European Fund for Regional Development (EFRD),and Instituto de Salud Carlos III (Ref: FIS PI 12/2898).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Casanova, I.J., Campos, M., Juarez, J.M. et al. Impact of time series discretization on intensive care burn unit survival classification. Prog Artif Intell 7, 41–53 (2018). https://doi.org/10.1007/s13748-017-0130-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13748-017-0130-8