Abstract
The paper presents an approach to a relevant issue of the supervised learning: classification. Creating the models that are able to generalize to new data, tuning the model so that the performance can be increased and models evaluation are relevant task of this sub-field of machine learning. In this paper, we have chosen to treat the evaluation of a model. The paper consists in two experiments, each of them with a particular purpose. First experiment outlines different model evaluation methods using specific performance metrics. This paper analyzes two models, logistic regression and the k-nearest neighbor with three methods of evaluation: train-test approach, test-set approach and the k-cross validation approach. On our analyzed data set, we achieved reasonable results using logistic regression as model with k-cross validation approach as evaluating method of the model. The second experiment determines the rank of the observations by predicting the probabilities depending on the response vector. Based on the predicted probability we try to improve the metric performance. For our particular task, one metric that allows us to extract relevant information from data and can be improved is the sensitivity metric.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30(7), 1145–1159 (1997)
Dursun, D., Walker, G., Kadam, A.: Predicting breast cancer survivability: a comparison of three data mining methods. Artif. Intell. Med. 34(2), 113–127 (2005)
Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)
Gama, J., Rodrigues, P.P., Raquel, S.: Evaluating algorithms that learn from data streams. In: Proceedings of the 2009 ACM symposium on Applied Computing. ACM (2009)
Gunawardana, A., Shani, G.: A survey of accuracy evaluation metrics of recommendation tasks. J. Mach. Learn. Res. 10, 2935–2962 (2009)
Harrington, P.: Machine Learning in Action. Manning, Greenwich (2012)
Hunter, J.D.: Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007)
Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2013)
Lobo, J.M., Valverde, A.J., Real, R.: AUC: a misleading measure of the performance of predictive distribution models. Glob. Ecol. Biogeogr. 17(2), 145–151 (2008)
McKinney, W.: Data structures for statistical computing in python. In: Proceedings of the 9th Python in Science Conference, pp. 51–56 (2010)
Michalski, R.S., Jaime, C.G., Mitchell, T.M. (eds.): Machine Learning: An Artificial Intelligence Approach. Springer Science & Business Media, Heidelberg (2013)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, É.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Perez, F., Granger, B.E.: IPython: a system for interactive scientific computing. Comput. Sci. Eng. 9, 21–29 (2007)
Seewald, A.K., Johannes, F.: An evaluation of grading classifiers. In: Advances in Intelligent Data Analysis, pp. 115–124. Springer, Heidelberg (2001)
Senthil Kumar, S., Hannah Inbarani, H., Azar, A.T., Own, H.S., Balas, V.E., Olariu, T.: Optimistic multi-granulation rough set based classification for neonatal jaundice. In: Kacprzyk, J. (ed.) Proceedings of the 6th International Workshop on Soft Computing Applications SOFA 2014. Advances in Intelligent Systems and Computing, Timisoara, Romania, July 22–24 (2014)
Weng, C.G., Josiah, P.: A new evaluation measure for imbalanced datasets. In: Proceedings of the 7th Australasian Data Mining Conference, vol. 87. Australian Computer Society, Inc. (2008)
Witten, I.H., Eibe, F.: Data Mining : Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2005)
Wolberg, W.H., Mangasarian, O.L.: Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proc. Nat. Acad. Sci. U.S.A. 87, 9193–9196 (1990)
Zhu, X.: Semi-supervised learning literature survey (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Coroiu, A.M. (2018). Model Evaluation as Approach to Predict a Diagnosis. In: Balas, V., Jain, L., Balas, M. (eds) Soft Computing Applications. SOFA 2016. Advances in Intelligent Systems and Computing, vol 634. Springer, Cham. https://doi.org/10.1007/978-3-319-62524-9_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-62524-9_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62523-2
Online ISBN: 978-3-319-62524-9
eBook Packages: EngineeringEngineering (R0)