A New Approach for Tuned Clustering Analysis

Ben Ishay, Roni; Herman, Maya; Yosefy, Chaim

doi:10.1007/978-3-319-96136-1_34

Roni Ben Ishay¹³,
Maya Herman¹³ &
Chaim Yosefy¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10934))

Included in the following conference series:

International Conference on Machine Learning and Data Mining in Pattern Recognition

1851 Accesses

The original version of this chapter was revised: The spelling of the third author’s name was corrected. The correction to this chapter is available at https://doi.org/10.1007/978-3-319-96136-1_35

Abstract

In this work, we present a new data mining (DM) approach (called tuned clustering analysis), which integrates clustering, and tuned clustering analysis. Usually, clusters which contain borderline results may be dismissed or ignored during the analysis stage. As a result, hidden insights that may be represented by these clusters, may not be revealed. This may harm the overall DM quality and especially, important hidden insights may be uncovered. Our new approach offers an iterative process which assist the data miner to make appropriate analysis decisions, and avoid dismissing possible insights. The idea is to apply an iterative DM process: clustering, analyzing, presenting new insights, or tuning and re-clustering those clusters which have borderline values. Clusters with borderline values are chosen and a new sub-database is built. Then, the sub-database is split, based on the attribute with the highest Entropy value. The tuning iterations, continues until new insights were found, or if the clusters quality are below a certain threshold. We demonstrated the tuned clustering analysis on real Echo heart measurements, using km-Impute clustering algorithm. During the implementation, initial clusters were produced. Although the quality of the clusters was high, no new medical insights were revealed. Therefore, we applied a clustering tuning and succeeded in finding new medical insights such as the influence of gender and the age on cardiac functioning and clinical modifications, with regard to resilience to diastolic disorder. Applying our approach has successfully managed to reveal new medical insights which were restored from borderline value clusters. This stands in contrast to traditional analysis methods, in which these potential insights may be missed or ignored.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Data-driven versus a domain-led approach to k-means clustering on an open heart failure dataset

Article Open access 25 July 2022

Analysis of Clustering Algorithms in Machine Learning for Healthcare Data

Head-to-head comparison of clustering methods for heterogeneous data: a simulation-driven benchmark

Article Open access 18 February 2021

Change history

23 January 2019
The original version of this chapter contained an error in the third author’s name. The spelling of Chaim Yosefy’s name was incorrect in the header of the paper. The author name has been corrected.

References

Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, New York (2011)
MATH Google Scholar
Srinivas, K., Rani, B.K., Govrdhan, A.: Applications of data mining techniques in healthcare and prediction of heart attacks. Int. J. Comput. Sci. Eng. (IJCSE) 2(02), 250–255 (2010)
Google Scholar
Ben Ishay, R., Herman, M.: A novel algorithm for the integration of the imputation of missing values and clustering. In: Perner, P. (ed.) MLDM 2015. LNCS (LNAI), vol. 9166, pp. 115–129. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21024-7_8
Chapter Google Scholar
Bache, K., Lichman, M.: UCI Machine Learning Repository (2013). http://archive.ics.uci.edu/ml. Accessed 1 May 2013
Kremer, H., et al.: An effective evaluation measure for clustering on evolving data streams. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 868–876. ACM, San Diego (2011)
Google Scholar
Na, Y., et al.: HS-measure: a hybrid clustering validity measure to interpret road traffic data. In: Proceedings of the 5th International ICST Conference on Performance Evaluation Methodologies and Tools, pp. 274–280. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), Paris (2011)
Google Scholar
Guo, A.: A new framework for clustering algorithm evaluation in the domain of functional genomics. In: Proceedings of the 2004 ACM Symposium on Applied Computing, pp. 143–146. ACM, Nicosia (2004)
Google Scholar
Tsipouras, M.G., et al.: Automated diagnosis of coronary artery disease based on data mining and fuzzy modeling. IEEE Trans. Inf. Technol. Biomed. 12(4), 447–458 (2008)
Article Google Scholar
Soni, J., et al.: Predictive data mining for medical diagnosis: an overview of heart disease prediction. Int. J. Comput. Appl. 17(8), 43–48 (2011)
Google Scholar
Palaniappan, S., Awang, R.: Intelligent heart disease prediction system using data mining techniques. In: IEEE/ACS International Conference on Computer Systems and Applications, AICCSA 2008. IEEE (2008)
Google Scholar
Bhatla, N., Jyoti, K.: An analysis of heart disease prediction using different data mining techniques. Int. J. Eng. 1(8), 1–4 (2012)
Google Scholar
Anbarasi, M., Anupriya, E., Iyengar, N.: Enhanced prediction of heart disease with feature subset selection using genetic algorithm. Int. J. Eng. Sci. Technol. 2(10), 5370–5376 (2010)
Google Scholar
Wosiak, A., Zakrzewska, D.: On integrating clustering and statistical analysis for supporting cardiovascular disease diagnosis. In: 2015 Federated Conference on Computer Science and Information Systems (FedCSIS). IEEE (2015)
Google Scholar
Zhang, S., Zhang, C., Yang, Q.: Data preparation for data mining. Appl. Artif. Intell. 17(5–6), 375–381 (2003)
Article Google Scholar
Chobanian, A.V., et al.: The seventh report of the joint national committee on prevention, detection, evaluation, and treatment of high blood pressure: the JNC 7 report. JAMA 289(19), 2560–2571 (2003)
Article Google Scholar
Zhao, R., et al.: Influences of age, gender, and circadian rhythm on deceleration capacity in subjects without evident heart diseases. Ann. Noninvasive Electrocardiol. 20(2), 158–166 (2015)
Article Google Scholar
Adams, K.F., et al.: Relation between gender, etiology and survival in patients with symptomatic heart failure. J. Am. Coll. Cardiol. 28(7), 1781–1788 (1996)
Article Google Scholar
Leinwand, L.A.: Gender is a potent modifier of the cardiovascular system. J. Clin. Invest. 112(3), 302–307 (2003)
Article Google Scholar
Karavidas, A., et al.: Aging and the cardiovascular system. Hell. J. Cardiol. 51(5), 421–427 (2010)
Google Scholar
Mirkin, B.: Clustering For Data Mining: A Data Recovery Approach (Chapman & Hall/CRC Computer Science). Chapman & Hall/CRC (2005)
Google Scholar
Gandrud, C.: Reproducible research with R and R studio. Chapman and Hall/CRC (2016)
Google Scholar
RStudio: An open source statistical language (2017). https://www.rstudio.com

Download references

Author information

Authors and Affiliations

The Open University of Israel, Raanana, Israel
Roni Ben Ishay & Maya Herman
The Barzili Medical Center Campus, Ben-Gurion University, Ashkelon, Israel
Chaim Yosefy

Authors

Roni Ben Ishay
View author publications
You can also search for this author in PubMed Google Scholar
Maya Herman
View author publications
You can also search for this author in PubMed Google Scholar
Chaim Yosefy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roni Ben Ishay .

Editor information

Editors and Affiliations

Institute of Computer Vision and Applied Computer Sciences, Leipzig, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ben Ishay, R., Herman, M., Yosefy, C. (2018). A New Approach for Tuned Clustering Analysis. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2018. Lecture Notes in Computer Science(), vol 10934. Springer, Cham. https://doi.org/10.1007/978-3-319-96136-1_34

Download citation

DOI: https://doi.org/10.1007/978-3-319-96136-1_34
Published: 08 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96135-4
Online ISBN: 978-3-319-96136-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics