Skip to main content

A New Approach for Tuned Clustering Analysis

  • Conference paper
  • First Online:
Machine Learning and Data Mining in Pattern Recognition (MLDM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10934))

  • 1782 Accesses

Abstract

In this work, we present a new data mining (DM) approach (called tuned clustering analysis), which integrates clustering, and tuned clustering analysis. Usually, clusters which contain borderline results may be dismissed or ignored during the analysis stage. As a result, hidden insights that may be represented by these clusters, may not be revealed. This may harm the overall DM quality and especially, important hidden insights may be uncovered. Our new approach offers an iterative process which assist the data miner to make appropriate analysis decisions, and avoid dismissing possible insights. The idea is to apply an iterative DM process: clustering, analyzing, presenting new insights, or tuning and re-clustering those clusters which have borderline values. Clusters with borderline values are chosen and a new sub-database is built. Then, the sub-database is split, based on the attribute with the highest Entropy value. The tuning iterations, continues until new insights were found, or if the clusters quality are below a certain threshold. We demonstrated the tuned clustering analysis on real Echo heart measurements, using km-Impute clustering algorithm. During the implementation, initial clusters were produced. Although the quality of the clusters was high, no new medical insights were revealed. Therefore, we applied a clustering tuning and succeeded in finding new medical insights such as the influence of gender and the age on cardiac functioning and clinical modifications, with regard to resilience to diastolic disorder. Applying our approach has successfully managed to reveal new medical insights which were restored from borderline value clusters. This stands in contrast to traditional analysis methods, in which these potential insights may be missed or ignored.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Change history

  • 23 January 2019

    The original version of this chapter contained an error in the third author’s name. The spelling of Chaim Yosefy’s name was incorrect in the header of the paper. The author name has been corrected.

References

  1. Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, New York (2011)

    MATH  Google Scholar 

  2. Srinivas, K., Rani, B.K., Govrdhan, A.: Applications of data mining techniques in healthcare and prediction of heart attacks. Int. J. Comput. Sci. Eng. (IJCSE) 2(02), 250–255 (2010)

    Google Scholar 

  3. Ben Ishay, R., Herman, M.: A novel algorithm for the integration of the imputation of missing values and clustering. In: Perner, P. (ed.) MLDM 2015. LNCS (LNAI), vol. 9166, pp. 115–129. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21024-7_8

    Chapter  Google Scholar 

  4. Bache, K., Lichman, M.: UCI Machine Learning Repository (2013). http://archive.ics.uci.edu/ml. Accessed 1 May 2013

  5. Kremer, H., et al.: An effective evaluation measure for clustering on evolving data streams. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 868–876. ACM, San Diego (2011)

    Google Scholar 

  6. Na, Y., et al.: HS-measure: a hybrid clustering validity measure to interpret road traffic data. In: Proceedings of the 5th International ICST Conference on Performance Evaluation Methodologies and Tools, pp. 274–280. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), Paris (2011)

    Google Scholar 

  7. Guo, A.: A new framework for clustering algorithm evaluation in the domain of functional genomics. In: Proceedings of the 2004 ACM Symposium on Applied Computing, pp. 143–146. ACM, Nicosia (2004)

    Google Scholar 

  8. Tsipouras, M.G., et al.: Automated diagnosis of coronary artery disease based on data mining and fuzzy modeling. IEEE Trans. Inf. Technol. Biomed. 12(4), 447–458 (2008)

    Article  Google Scholar 

  9. Soni, J., et al.: Predictive data mining for medical diagnosis: an overview of heart disease prediction. Int. J. Comput. Appl. 17(8), 43–48 (2011)

    Google Scholar 

  10. Palaniappan, S., Awang, R.: Intelligent heart disease prediction system using data mining techniques. In: IEEE/ACS International Conference on Computer Systems and Applications, AICCSA 2008. IEEE (2008)

    Google Scholar 

  11. Bhatla, N., Jyoti, K.: An analysis of heart disease prediction using different data mining techniques. Int. J. Eng. 1(8), 1–4 (2012)

    Google Scholar 

  12. Anbarasi, M., Anupriya, E., Iyengar, N.: Enhanced prediction of heart disease with feature subset selection using genetic algorithm. Int. J. Eng. Sci. Technol. 2(10), 5370–5376 (2010)

    Google Scholar 

  13. Wosiak, A., Zakrzewska, D.: On integrating clustering and statistical analysis for supporting cardiovascular disease diagnosis. In: 2015 Federated Conference on Computer Science and Information Systems (FedCSIS). IEEE (2015)

    Google Scholar 

  14. Zhang, S., Zhang, C., Yang, Q.: Data preparation for data mining. Appl. Artif. Intell. 17(5–6), 375–381 (2003)

    Article  Google Scholar 

  15. Chobanian, A.V., et al.: The seventh report of the joint national committee on prevention, detection, evaluation, and treatment of high blood pressure: the JNC 7 report. JAMA 289(19), 2560–2571 (2003)

    Article  Google Scholar 

  16. Zhao, R., et al.: Influences of age, gender, and circadian rhythm on deceleration capacity in subjects without evident heart diseases. Ann. Noninvasive Electrocardiol. 20(2), 158–166 (2015)

    Article  Google Scholar 

  17. Adams, K.F., et al.: Relation between gender, etiology and survival in patients with symptomatic heart failure. J. Am. Coll. Cardiol. 28(7), 1781–1788 (1996)

    Article  Google Scholar 

  18. Leinwand, L.A.: Gender is a potent modifier of the cardiovascular system. J. Clin. Invest. 112(3), 302–307 (2003)

    Article  Google Scholar 

  19. Karavidas, A., et al.: Aging and the cardiovascular system. Hell. J. Cardiol. 51(5), 421–427 (2010)

    Google Scholar 

  20. Mirkin, B.: Clustering For Data Mining: A Data Recovery Approach (Chapman & Hall/CRC Computer Science). Chapman & Hall/CRC (2005)

    Google Scholar 

  21. Gandrud, C.: Reproducible research with R and R studio. Chapman and Hall/CRC (2016)

    Google Scholar 

  22. RStudio: An open source statistical language (2017). https://www.rstudio.com

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roni Ben Ishay .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ben Ishay, R., Herman, M., Yosefy, C. (2018). A New Approach for Tuned Clustering Analysis. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2018. Lecture Notes in Computer Science(), vol 10934. Springer, Cham. https://doi.org/10.1007/978-3-319-96136-1_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-96136-1_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-96135-4

  • Online ISBN: 978-3-319-96136-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics