Using a Reverse Engineering Type Paradigm in Clustering. An Evolutionary Programming Based Approach

Owsiński, Jan W.; Kacprzyk, Janusz; Opara, Karol; Stańczak, Jarosław; Zadrożny, Sławomir

doi:10.1007/978-3-319-47557-8_9

Jan W. Owsiński⁵,
Janusz Kacprzyk⁵,
Karol Opara⁵,
Jarosław Stańczak⁵ &
…
Sławomir Zadrożny⁵

Part of the book series: Studies in Computational Intelligence ((SCI,volume 671))

973 Accesses
3 Citations

Abstract

The aim of this work is to propose a novel view on the well-known clustering approach that is here dealt with from a different perspective. We consider a kind of a reverse engineering related approach, which basically consists in discovering the broadly meant values of the parameters of the clustering algorithm, including the choice of the algorithm itself, or even – more generally – its class, and some other parameters, that have possibly led to a given partition of data, known a priori. We discuss the motivation and possible interpretations related to such a novel reversed process. In fact the main motivation is gaining insight into the structure of the given data set or even a family of data sets. The use of the evolutionary strategies is proposed to computationally implement such a reverse analysis. The idea and feasibility of the proposed computational approach is illustrated on two benchmark type data sets. The preliminary results obtained are promising in terms of a balance between analytic and computational effectiveness and efficiency, quality of results obtained and their comprehensiveness and intuitive appeal, a high application potential, as well as possibilities for further extensions.

To Saddaki, Professor Sadaaki Miyamoto, with a deep appreciation for his long time research and scholarly excellence and inspiration, and so many years of friendship

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
It is possible to start with a data similarity/distance matrix, if available, without an explicit characterization of the data in terms of some attributes values, and such a setting also seems to provide a reasonable context for the paradigm proposed in this paper, but we will leave this case for a possible further study.
2.
The concept of a Reverse Cluster Analysis has been introduced by Ríos and Velásquez [25] in case of the SOM based clustering but it is meant there in a rather different sense as associating original data points with the nodes in the trained network.
3.
Possibly except for an identifier attribute, which makes it possible to distinguish particular elements of X.
4.
Actually, even in the “absolute” case, doubts may arise, if the situation resembles the one of multiple overlapping distributions, i.e. although \(P_{A}\) is well established and “certain”, it is hardly reflected in the data, represented by X, so that many objects \(x_{i}\) might be equally well assigned to different clusters.
5.
Series 2: PC class 4-core processor station, 3.2 GHz, under 64 bit Linux Fedora 17; simulation software written in C and compiled with g++.

References

Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J.M., Perona, I. (2013) An extensive comparative study of cluster validity indices. Pattern Recognition 46 (1), 243–256.
Article Google Scholar
Brun, M., Sima, Ch., Hua, J., Lowey, J., Carroll, B., Suh, E., Dougherty, E.R. (2007) Model-based evaluation of clustering validation measures. Pattern Recognition 40(3), 807–824,
Article MATH Google Scholar
Charrad, M., Ghazzali, N., Boiteau, V., Niknafs, A. (2014) NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set. Journal of Statistical Software, 61(6), 1–36.
Article Google Scholar
Choi, S.-S., Cha, S.-K., Tappert, Ch.C. (2010) A survey of binary similarity and distance measures. Journal of Systemics, Cybernetics and Informatics, vol. 8, no 1, 43–48.
Google Scholar
Craven, M.W., Shavlik, J.W. (1995). Extracting comprehensible concept representations from trained neural networks. In: Working Notes of the IJCAI’95 Workshop on Comprehensibility in Machine Learning, Montreal, Canada, 61–75.
Google Scholar
Cross, V. Sudkamp, Th.A. (2002) Similarity and compatibility in fuzzy set theory: assessment and applications. Physica-Verlag, Heidelberg; New York.
Google Scholar
Das, S., Suganthan, P.N. (2011) Differential evolution: a survey of the state-of-the-art. Evolutionary Computation, IEEE Transactions on, 15(1), 4–31.
Google Scholar
De Amorim, R. (2015) Feature Relevance in Ward’s Hierarchical Clustering Using the L\(p\) Norm. Journal of Classification 32, 46–62.
Article MathSciNet MATH Google Scholar
Denœud, L., Guénoche, A. (2006) Comparison of distance indices between partitions. In Data Science and Classification, Springer, Berlin Heidelberg, 21–28.
Chapter Google Scholar
Fisch, D., Gruber, T., Sick, B. (2011) SwiftRule: Mining Comprehensible Classification Rules for Time Series Analysis. IEEE Transactions on Knowledge and Data Engineering: 23 (5), 774–787.
Google Scholar
Fisher, R.A. (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics 7 (2), 179–188.
Google Scholar
Halkidi, M., Batistakis, Y., Vazirgiannis, M. (2001) On clustering validation techniques. Journal of Intelligent Information Systems, 17(2–3), pp. 107–145.
Article MATH Google Scholar
Hastie, T., Tibsihrani, R., Friedman, J. (2009) The Elements of Statistical Learning Data Mining, Inference, and Prediction. Second Edition. Springer-Verlag New York.
Google Scholar
Hubert, L., Arabie, P. (1985) Comparing partitions. Journal of Classification, 2(1), 193–218.
Article MATH Google Scholar
Kacprzyk, J., Zadrożny, S. (2013) Comprehensiveness of Linguistic Data Summaries: A Crucial Role of Protoforms. In: Christian Moewes and Andreas Nürnberger (Eds.): Computational Intelligence in Intelligent Data Analysis. Springer-Verlag, Berlin, Heidelberg, 207–221.
Chapter Google Scholar
Kaufman, L., Rousseeuw, P.J. (1990) Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York.
Google Scholar
Lance, G.N.,Williams. W.T. (1966) A General Theory of Classificatory Sorting Strategies. 1. Hierarchical systems. Computer Journal, 9, 373–380.
Google Scholar
Maechler, Martin, et al. (2015) “cluster: cluster analysis extended Rousseeuw et al.” R package, version 2.0.3.
Google Scholar
Michalski, R. (1983) A theory and methodology of inductive learning. Artificial Intelligence: 20(2), 111–161.
Google Scholar
Miyamoto, S. (2014) Classification Rules in Methods of Clustering (featured article). IEEE Intelligent Informatics Bulletin, 15(1), 15–21.
MathSciNet Google Scholar
Miyamoto, S., Ichihashi, H., Honda, K. (2008) Algorithms for Fuzzy Clustering: Methods in c-Means Clustering with Applications. Springer-Verlag, Berlin Heidelberg, Studies in Fuzziness and Soft Computing 229.
MATH Google Scholar
Mullen, K.M, Ardia, D., Gil, D., Windover, D., Cline, J. (2011) DEoptim: An R Package for Global Optimization by Differential Evolution. Journal of Statistical Software, 40(6), 1–26.
Article Google Scholar
Pryke, A., Beale, R. (2004) Interactive Comprehensible Data Mining. In: Y. Cai (ed.): Ambient Intelligence for Scientific Discovery, Springer, LNCS 3345, 48–65.
Google Scholar
R Core Team (2014) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.
Ríos, S.A., Velásquez J.D. (2011) Finding Representative Web Pages Based on a SOM and a Reverse Cluster Analysis. International Journal on Artificial Intelligence Tools 20(1) 93–118.
Google Scholar
Stańczak, J. (2003) Biologically inspired methods for control of evolutionary algorithms. Control and Cybernetics, 32 (2), 411–433.
MATH Google Scholar
Storn, R., Price, K. (1997) Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization, 11(4), 341–359.
Article MathSciNet MATH Google Scholar
Torra, V., Endo, Y., Miyamoto, S. (2011) Computationally intensive parameter selection for clustering algorithms: The case of fuzzy \(c\)-means with tolerance. International Journal of Intelligent Systems, 26 (4), 313–322.
Article Google Scholar
Zhou, Z.H. (2005) Comprehensibility of data mining algorithms. In: J. Wang (ed.): Encyclopedia of Data Warehousing and Mining, IGI Global: Hershey, 190–195.
Google Scholar

Download references

Acknowledgements

Partially supported by the National Science Centre under Grant UMO-2012/05/B/ST6/03068.

Author information

Authors and Affiliations

Systems Research Institute, Polish Academy of Sciences, Newelska 6, 01-447, Warszawa, Poland
Jan W. Owsiński, Janusz Kacprzyk, Karol Opara, Jarosław Stańczak & Sławomir Zadrożny

Authors

Jan W. Owsiński
View author publications
You can also search for this author in PubMed Google Scholar
Janusz Kacprzyk
View author publications
You can also search for this author in PubMed Google Scholar
Karol Opara
View author publications
You can also search for this author in PubMed Google Scholar
Jarosław Stańczak
View author publications
You can also search for this author in PubMed Google Scholar
Sławomir Zadrożny
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sławomir Zadrożny .

Editor information

Editors and Affiliations

University of Skövde, School of Informatics University of Skövde, Skövde, Sweden
Vicenç Torra
University of Skövde, School of Informatics University of Skövde, Skövde, Sweden
Anders Dahlbom
Toho Gakuen , Tokyo, Japan
Yasuo Narukawa

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Owsiński, J.W., Kacprzyk, J., Opara, K., Stańczak, J., Zadrożny, S. (2017). Using a Reverse Engineering Type Paradigm in Clustering. An Evolutionary Programming Based Approach. In: Torra, V., Dahlbom, A., Narukawa, Y. (eds) Fuzzy Sets, Rough Sets, Multisets and Clustering. Studies in Computational Intelligence, vol 671. Springer, Cham. https://doi.org/10.1007/978-3-319-47557-8_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-47557-8_9
Published: 14 January 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47556-1
Online ISBN: 978-3-319-47557-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics