Abstract
Healthcare organizations aim at deriving valuable insights employing data mining and soft computing techniques on the vast data stores that have been accumulated over the years. This data however, might consist of missing, incorrect and most of the time, incomplete instances that can have a detrimental effect on the predictive analytics of the healthcare data. Preprocessing of this data, specifically the imputation of missing values offers a challenge for reliable modeling. This work presents a novel preprocessing phase with missing value imputation for both numerical and categorical data. A hybrid combination of Classification and Regression Trees (CART) and Genetic Algorithms to impute missing continuous values and Self Organizing Feature Maps (SOFM) to impute categorical values is adapted in this work. Further, Artificial Neural Networks (ANN) is used to validate the improved accuracy of prediction after imputation. To evaluate this model, we use PIMA Indians Diabetes Data set (PIDD), and Mammographic Mass Data (MMD). The accuracy of the proposed model that emphasizes on a preprocessing phase is shown to be superior over the existing techniques. This approach is simple, easy to implement and practically reliable.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Cabena, P., Hadjinian, P., Stadler, R., Verhees, J., Zanasi, A.: Discovering Data Mining: from Concepts to Implementation. Prentice Hall, Englewood Cliffs (1998)
Acuna, E., Rodriguez, C.: The Treatment of Missing Values and its Effect in the Classifier Accuracy. In: Multiscale Methods in Science and Engineering. LNCS, pp. 639–647. Springer, Heidelberg (2004)
Peng, L., Lei, L.: A Review of Missing Data Treatment Methods. Intelligent Information Management Systems and Technologies 1(3), 412–419 (2005)
Bhat, V.H., Rao, P.G., Shenoy, P.D., Venugopal, K.R., Patnaik, L.M.: An Efficient Prediction Model for Diabetic Database Using Soft Computing Techniques. In: Sakai, H., Chakraborty, M.K., Hassanien, A.E., Ślęzak, D., Zhu, W. (eds.) RSFDGrC 2009. LNCS, vol. 5908, pp. 328–335. Springer, Heidelberg (2009)
Mehala, B., Ranjit, J.T.P., Vivekanandan, K.: Selecting Scalable Algorithms to Deal with Missing Values. International Journal of Recent Trends in Engineering 1(2) (2009)
Batista, G.E.A.P.A., Monard, M.C.: K-Nearest Neighbour as Imputation Method. Experimental Results. Tech. Report 186, ICMC-USP (2002)
Breault, J.L.: Data Mining Diabetic Databases: Are Rough Sets a Useful Addition? Artificial Intelligence in Medicine 27, 227–236 (2003)
King, M.A., Elder IV, J.F., et al.: Evaluation of Fourteen Desktop Data Mining Tools. In: Proc. of IEEE International Conference on Systems, Man and Cybernetics, San Diego, CA (1998)
Khan, A.H.: Multiplier-free Feedforward Networks. In: Proc. of the IEEE International Joint Conference on Neural Networks (IJCNN), Honolulu, Hawaii, vol. 3, pp. 2698–2703 (2002)
Elsayad, A.M.: Predicting the Severity of Breast Masses with Ensemble of Bayesian Classifiers. Journal of Computer Science 6(5), 576–584 (2010)
Machine Learning Database Repository at the University of California, Irvine, http://www.ics.uci.edu/mlearn/MLRepository
Kayaer, K., Yildirim, T.: Medical Diagnosis on Pima Indian Diabetes using General Regression Neural Networks. In: Proc. of the International Conference on Artificial Neural Networks/International Conference on Neural Information Processing, Istanbul, Turkey, pp. 181–184 (2003)
Aslam, M.W., Nandi, A.K.: Detection of Diabetes using Genetic Programming. In: 18th European Signal Processing Conference, Denmark, pp. 1184–1188 (August 2010)
Magnani, M.: Techniques for Dealing with Missing Data in Knowledge Discovery Tasks. By Department of Computer Science, University of Bologna (2004)
Estébanez, C., Aler, R., José, M.: Method Based on Genetic Programming for Improving the Quality of Data Sets in Classification Problems. International Journal of Computer Science and Applications 4(1), 69–80 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bhat, V.H., Rao, P.G., Krishna, S., Shenoy, P.D., Venugopal, K.R., Patnaik, L.M. (2011). An Efficient Framework for Prediction in Healthcare Data Using Soft Computing Techniques. In: Abraham, A., Mauri, J.L., Buford, J.F., Suzuki, J., Thampi, S.M. (eds) Advances in Computing and Communications. ACC 2011. Communications in Computer and Information Science, vol 192. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22720-2_55
Download citation
DOI: https://doi.org/10.1007/978-3-642-22720-2_55
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22719-6
Online ISBN: 978-3-642-22720-2
eBook Packages: Computer ScienceComputer Science (R0)