Skip to main content

Impact of K-Means on the Performance of Classifiers for Labeled Data

  • Conference paper

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 94))

Abstract

In this study a novel framework for data mining in clinical decision making have been proposed. Our framework addresses the problems of assessing and utilizing data mining models in medical domain. The framework consists of three stages. The first stage involves preprocessing of the data to improve its quality. The second stage employs k-means clustering algorithm to cluster the data into k clusters (in our case, k=2 i.e. cluster0 / no, cluster1 / yes) for validation the class labels associated with the data. After clustering, the class labels associated with the data is compared with the labels generated by clustering algorithm if both the labels are same it is assumed that the data is correctly classified. The instances for which the labels are not same are considered to be misclassified and are removed before further processing. In the third stage support vector machine classification is applied. The classification model is validated by using k-fold cross validation method. The performance of SVM (Support Vector Machine) classifier is also compared with Naive Bayes classifier. In our case SVM classifier outperforms the Naive Bayes classifier. To validate the proposed framework, experiments have been carried out on benchmark datasets such as Indian Pima diabetes dataset and Wisconsin breast cancer dataset (WBCD).These datasets were obtained from the University of California at Irvine (UCI) machine learning repository. Our proposed study obtained classification accuracy on both datasets, which is better with respect to the other classification algorithms applied on the same datasets as cited in the literature. The performance of the proposed framework was also evaluated using the sensitivity and specificity measures.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Gan, D.: Diabetes atlas, Brussels: International diabetes Second Eds, http://www.eatlas.idf.org/Atlaswebdata/docs/2003Summary.pdf (assesed 24/04/2009)

  2. Acharya, U.R., Tan, P.H., Subramanian, T., et al.: Automated identification of diabetic type 2 subjects with and without neuropathy using wavelet transform on pedobarograph. J. Medical Systems 32(1), 21–29 (2008)

    Article  Google Scholar 

  3. Michie, D., Spiegelhalter, D.J., Taylor, C.C.: Machine learning, neural and statistical Classification. Ellis Horwood, NJ (1994)

    MATH  Google Scholar 

  4. Bioch, J.C., Meer, O., Potharst, R.: Classification using bayesian neural nets. In: Int. Conf. Neural Networks, pp. 1488–1149 (1996)

    Google Scholar 

  5. Carpenter, G.A., Markuzon, N.: ARTMAP-IC and medical diagnosis: Instance counting and inconsistent cases. Neural Networks 11, 323–336 (1998)

    Article  Google Scholar 

  6. Deng, D., Kasabov, K.: On-line pattern analysis by evolving self-organizing maps. In: Proc. 5th Biannual Int. Conf. Artificial Neural Networks and Expert System (ANNES), pp. 46–51 (2001)

    Google Scholar 

  7. Kayaer, K., Yıldırım, T.: Medical diagnosis on Pima Indian diabetes using general regression neural networks. In: Proc. Int. Conf. Artificial Neural Networks and Neural Information Processing (ICANN/ICONIP), pp. 181–184 (2003)

    Google Scholar 

  8. Polat, K., Gunes, S.: An expert system approach based on principal component analysis and adaptive neuro-fuzzy inference system to diagnosis of diabetes disease. J. Digital Signal Processing 17(4), 702–710 (2007)

    Article  Google Scholar 

  9. Polat, K., Gunes, S., Aslan, A.: A cascade learning system for classification of diabetes disease: Generalized discriminant analysis and least square support vector machine. Expert Systems with Applications 34(1), 214–221 (2008)

    Article  Google Scholar 

  10. Humar, K., Novruz, A.: Design of a hybrid system for the diabetes and heart diseases. J. Expert Systems with Application 35, 82–89 (2008)

    Article  Google Scholar 

  11. Newman, D., Hettich, J.S., Blake, C.L.S., Merz, C.J.: UCI Repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine (1998), http://www.ics.vci.edu/~mleasn/MLRepository.html (last assessed: 1/5/2009)

    Google Scholar 

  12. Han, J., Kamber, M.: Data mining: Concepts and techniques, pp. 47–94. Morgan Kaufmann Publisher, San Francisco (2006)

    Google Scholar 

  13. Shekhar, R., Gaddam, V., Phoha, V., Kiran, S.: K-Means+ID3 A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods. IEEE Trans. Knowledge AND Data engineering 19(3), 345–354 (2007)

    Article  Google Scholar 

  14. Witten, I.H., Frank, E.: Data mining Practical Machine Learning Tools and Techniques, pp. 363–423. Morgan Kaufmann, San Fransisco (2005)

    MATH  Google Scholar 

  15. Plat, J.: Fast training of support vector machine using sequential minimal optimization in Advance kernel support vector machine learnining. In: Shoelkopf, B., Burges, C., Somolo, A. (eds.), pp. 61–74. MIT Press, Cambridge (1998)

    Google Scholar 

  16. Keerti, S.S., Shevade, S.K., Bhattachayra, C., Murthy, K.R.K.: Improvements to Plato’SMO Algorithmfor SVM classifiere design. Neural Computation 13(3), 637 (2001)

    Article  Google Scholar 

  17. John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Int. Conf. Uncertainty in Artificial Intelligence, San Mateo, pp. 338–345 (1995)

    Google Scholar 

  18. Langley, P., Sage, S.: Induction of selective Bayesian classifiers. In: Int. Conf. Uncertainty in Artificial Intelligence. Morgan Kaufmann, Seattle (1994)

    Google Scholar 

  19. Delen, D., Walker, G., Kadam, A.: Predicting Breast Cancer Survivability: a Comparison of Three Data Mining Methods. J. Artificial Intelligence in Medicine 34(2), 113–127 (2005)

    Article  Google Scholar 

  20. Thora, J., Ebba, T., Helgi, S., Sven, S.: The feasibility of constructing a predictive outcome model for breast cancer using the tools of data mining. J. Expert Systems with Applications 34, 108–118 (2008)

    Article  Google Scholar 

  21. Polat, K., Gunes, S.: Breast cancer diagnosis using least square support vector Machine. Digital Signal Processing 17(4), 694–701 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Patil, B.M., Joshi, R.C., Toshniwal, D. (2010). Impact of K-Means on the Performance of Classifiers for Labeled Data. In: Ranka, S., et al. Contemporary Computing. IC3 2010. Communications in Computer and Information Science, vol 94. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14834-7_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14834-7_40

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14833-0

  • Online ISBN: 978-3-642-14834-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics