Abstract
The aim of this work is to propose a novel view on the well-known clustering approach that is here dealt with from a different perspective. We consider a kind of a reverse engineering related approach, which basically consists in discovering the broadly meant values of the parameters of the clustering algorithm, including the choice of the algorithm itself, or even – more generally – its class, and some other parameters, that have possibly led to a given partition of data, known a priori. We discuss the motivation and possible interpretations related to such a novel reversed process. In fact the main motivation is gaining insight into the structure of the given data set or even a family of data sets. The use of the evolutionary strategies is proposed to computationally implement such a reverse analysis. The idea and feasibility of the proposed computational approach is illustrated on two benchmark type data sets. The preliminary results obtained are promising in terms of a balance between analytic and computational effectiveness and efficiency, quality of results obtained and their comprehensiveness and intuitive appeal, a high application potential, as well as possibilities for further extensions.
To Saddaki, Professor Sadaaki Miyamoto, with a deep appreciation for his long time research and scholarly excellence and inspiration, and so many years of friendship
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
It is possible to start with a data similarity/distance matrix, if available, without an explicit characterization of the data in terms of some attributes values, and such a setting also seems to provide a reasonable context for the paradigm proposed in this paper, but we will leave this case for a possible further study.
- 2.
The concept of a Reverse Cluster Analysis has been introduced by Ríos and Velásquez [25] in case of the SOM based clustering but it is meant there in a rather different sense as associating original data points with the nodes in the trained network.
- 3.
Possibly except for an identifier attribute, which makes it possible to distinguish particular elements of X.
- 4.
Actually, even in the “absolute” case, doubts may arise, if the situation resembles the one of multiple overlapping distributions, i.e. although \(P_{A}\) is well established and “certain”, it is hardly reflected in the data, represented by X, so that many objects \(x_{i}\) might be equally well assigned to different clusters.
- 5.
Series 2: PC class 4-core processor station, 3.2 GHz, under 64 bit Linux Fedora 17; simulation software written in C and compiled with g++.
References
Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J.M., Perona, I. (2013) An extensive comparative study of cluster validity indices. Pattern Recognition 46 (1), 243–256.
Brun, M., Sima, Ch., Hua, J., Lowey, J., Carroll, B., Suh, E., Dougherty, E.R. (2007) Model-based evaluation of clustering validation measures. Pattern Recognition 40(3), 807–824,
Charrad, M., Ghazzali, N., Boiteau, V., Niknafs, A. (2014) NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set. Journal of Statistical Software, 61(6), 1–36.
Choi, S.-S., Cha, S.-K., Tappert, Ch.C. (2010) A survey of binary similarity and distance measures. Journal of Systemics, Cybernetics and Informatics, vol. 8, no 1, 43–48.
Craven, M.W., Shavlik, J.W. (1995). Extracting comprehensible concept representations from trained neural networks. In: Working Notes of the IJCAI’95 Workshop on Comprehensibility in Machine Learning, Montreal, Canada, 61–75.
Cross, V. Sudkamp, Th.A. (2002) Similarity and compatibility in fuzzy set theory: assessment and applications. Physica-Verlag, Heidelberg; New York.
Das, S., Suganthan, P.N. (2011) Differential evolution: a survey of the state-of-the-art. Evolutionary Computation, IEEE Transactions on, 15(1), 4–31.
De Amorim, R. (2015) Feature Relevance in Ward’s Hierarchical Clustering Using the L\(p\) Norm. Journal of Classification 32, 46–62.
Denœud, L., Guénoche, A. (2006) Comparison of distance indices between partitions. In Data Science and Classification, Springer, Berlin Heidelberg, 21–28.
Fisch, D., Gruber, T., Sick, B. (2011) SwiftRule: Mining Comprehensible Classification Rules for Time Series Analysis. IEEE Transactions on Knowledge and Data Engineering: 23 (5), 774–787.
Fisher, R.A. (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics 7 (2), 179–188.
Halkidi, M., Batistakis, Y., Vazirgiannis, M. (2001) On clustering validation techniques. Journal of Intelligent Information Systems, 17(2–3), pp. 107–145.
Hastie, T., Tibsihrani, R., Friedman, J. (2009) The Elements of Statistical Learning Data Mining, Inference, and Prediction. Second Edition. Springer-Verlag New York.
Hubert, L., Arabie, P. (1985) Comparing partitions. Journal of Classification, 2(1), 193–218.
Kacprzyk, J., Zadrożny, S. (2013) Comprehensiveness of Linguistic Data Summaries: A Crucial Role of Protoforms. In: Christian Moewes and Andreas Nürnberger (Eds.): Computational Intelligence in Intelligent Data Analysis. Springer-Verlag, Berlin, Heidelberg, 207–221.
Kaufman, L., Rousseeuw, P.J. (1990) Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York.
Lance, G.N.,Williams. W.T. (1966) A General Theory of Classificatory Sorting Strategies. 1. Hierarchical systems. Computer Journal, 9, 373–380.
Maechler, Martin, et al. (2015) “cluster: cluster analysis extended Rousseeuw et al.” R package, version 2.0.3.
Michalski, R. (1983) A theory and methodology of inductive learning. Artificial Intelligence: 20(2), 111–161.
Miyamoto, S. (2014) Classification Rules in Methods of Clustering (featured article). IEEE Intelligent Informatics Bulletin, 15(1), 15–21.
Miyamoto, S., Ichihashi, H., Honda, K. (2008) Algorithms for Fuzzy Clustering: Methods in c-Means Clustering with Applications. Springer-Verlag, Berlin Heidelberg, Studies in Fuzziness and Soft Computing 229.
Mullen, K.M, Ardia, D., Gil, D., Windover, D., Cline, J. (2011) DEoptim: An R Package for Global Optimization by Differential Evolution. Journal of Statistical Software, 40(6), 1–26.
Pryke, A., Beale, R. (2004) Interactive Comprehensible Data Mining. In: Y. Cai (ed.): Ambient Intelligence for Scientific Discovery, Springer, LNCS 3345, 48–65.
R Core Team (2014) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.
Ríos, S.A., Velásquez J.D. (2011) Finding Representative Web Pages Based on a SOM and a Reverse Cluster Analysis. International Journal on Artificial Intelligence Tools 20(1) 93–118.
Stańczak, J. (2003) Biologically inspired methods for control of evolutionary algorithms. Control and Cybernetics, 32 (2), 411–433.
Storn, R., Price, K. (1997) Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization, 11(4), 341–359.
Torra, V., Endo, Y., Miyamoto, S. (2011) Computationally intensive parameter selection for clustering algorithms: The case of fuzzy \(c\)-means with tolerance. International Journal of Intelligent Systems, 26 (4), 313–322.
Zhou, Z.H. (2005) Comprehensibility of data mining algorithms. In: J. Wang (ed.): Encyclopedia of Data Warehousing and Mining, IGI Global: Hershey, 190–195.
Acknowledgements
Partially supported by the National Science Centre under Grant UMO-2012/05/B/ST6/03068.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Owsiński, J.W., Kacprzyk, J., Opara, K., Stańczak, J., Zadrożny, S. (2017). Using a Reverse Engineering Type Paradigm in Clustering. An Evolutionary Programming Based Approach. In: Torra, V., Dahlbom, A., Narukawa, Y. (eds) Fuzzy Sets, Rough Sets, Multisets and Clustering. Studies in Computational Intelligence, vol 671. Springer, Cham. https://doi.org/10.1007/978-3-319-47557-8_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-47557-8_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47556-1
Online ISBN: 978-3-319-47557-8
eBook Packages: EngineeringEngineering (R0)