Skip to main content

Using a Reverse Engineering Type Paradigm in Clustering. An Evolutionary Programming Based Approach

  • Chapter
  • First Online:
Fuzzy Sets, Rough Sets, Multisets and Clustering

Part of the book series: Studies in Computational Intelligence ((SCI,volume 671))

Abstract

The aim of this work is to propose a novel view on the well-known clustering approach that is here dealt with from a different perspective. We consider a kind of a reverse engineering related approach, which basically consists in discovering the broadly meant values of the parameters of the clustering algorithm, including the choice of the algorithm itself, or even – more generally – its class, and some other parameters, that have possibly led to a given partition of data, known a priori. We discuss the motivation and possible interpretations related to such a novel reversed process. In fact the main motivation is gaining insight into the structure of the given data set or even a family of data sets. The use of the evolutionary strategies is proposed to computationally implement such a reverse analysis. The idea and feasibility of the proposed computational approach is illustrated on two benchmark type data sets. The preliminary results obtained are promising in terms of a balance between analytic and computational effectiveness and efficiency, quality of results obtained and their comprehensiveness and intuitive appeal, a high application potential, as well as possibilities for further extensions.

To Saddaki, Professor Sadaaki Miyamoto, with a deep appreciation for his long time research and scholarly excellence and inspiration, and so many years of friendship

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    It is possible to start with a data similarity/distance matrix, if available, without an explicit characterization of the data in terms of some attributes values, and such a setting also seems to provide a reasonable context for the paradigm proposed in this paper, but we will leave this case for a possible further study.

  2. 2.

    The concept of a Reverse Cluster Analysis has been introduced by Ríos and Velásquez [25] in case of the SOM based clustering but it is meant there in a rather different sense as associating original data points with the nodes in the trained network.

  3. 3.

    Possibly except for an identifier attribute, which makes it possible to distinguish particular elements of X.

  4. 4.

    Actually, even in the “absolute” case, doubts may arise, if the situation resembles the one of multiple overlapping distributions, i.e. although \(P_{A}\) is well established and “certain”, it is hardly reflected in the data, represented by X, so that many objects \(x_{i}\) might be equally well assigned to different clusters.

  5. 5.

    Series 2: PC class 4-core processor station, 3.2 GHz, under 64 bit Linux Fedora 17; simulation software written in C and compiled with g++.

References

  1. Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J.M., Perona, I. (2013) An extensive comparative study of cluster validity indices. Pattern Recognition 46 (1), 243–256.

    Article  Google Scholar 

  2. Brun, M., Sima, Ch., Hua, J., Lowey, J., Carroll, B., Suh, E., Dougherty, E.R. (2007) Model-based evaluation of clustering validation measures. Pattern Recognition 40(3), 807–824,

    Article  MATH  Google Scholar 

  3. Charrad, M., Ghazzali, N., Boiteau, V., Niknafs, A. (2014) NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set. Journal of Statistical Software, 61(6), 1–36.

    Article  Google Scholar 

  4. Choi, S.-S., Cha, S.-K., Tappert, Ch.C. (2010) A survey of binary similarity and distance measures. Journal of Systemics, Cybernetics and Informatics, vol. 8, no 1, 43–48.

    Google Scholar 

  5. Craven, M.W., Shavlik, J.W. (1995). Extracting comprehensible concept representations from trained neural networks. In: Working Notes of the IJCAI’95 Workshop on Comprehensibility in Machine Learning, Montreal, Canada, 61–75.

    Google Scholar 

  6. Cross, V. Sudkamp, Th.A. (2002) Similarity and compatibility in fuzzy set theory: assessment and applications. Physica-Verlag, Heidelberg; New York.

    Google Scholar 

  7. Das, S., Suganthan, P.N. (2011) Differential evolution: a survey of the state-of-the-art. Evolutionary Computation, IEEE Transactions on, 15(1), 4–31.

    Google Scholar 

  8. De Amorim, R. (2015) Feature Relevance in Ward’s Hierarchical Clustering Using the L\(p\) Norm. Journal of Classification 32, 46–62.

    Article  MathSciNet  MATH  Google Scholar 

  9. Denœud, L., Guénoche, A. (2006) Comparison of distance indices between partitions. In Data Science and Classification, Springer, Berlin Heidelberg, 21–28.

    Chapter  Google Scholar 

  10. Fisch, D., Gruber, T., Sick, B. (2011) SwiftRule: Mining Comprehensible Classification Rules for Time Series Analysis. IEEE Transactions on Knowledge and Data Engineering: 23 (5), 774–787.

    Google Scholar 

  11. Fisher, R.A. (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics 7 (2), 179–188.

    Google Scholar 

  12. Halkidi, M., Batistakis, Y., Vazirgiannis, M. (2001) On clustering validation techniques. Journal of Intelligent Information Systems, 17(2–3), pp. 107–145.

    Article  MATH  Google Scholar 

  13. Hastie, T., Tibsihrani, R., Friedman, J. (2009) The Elements of Statistical Learning Data Mining, Inference, and Prediction. Second Edition. Springer-Verlag New York.

    Google Scholar 

  14. Hubert, L., Arabie, P. (1985) Comparing partitions. Journal of Classification, 2(1), 193–218.

    Article  MATH  Google Scholar 

  15. Kacprzyk, J., Zadrożny, S. (2013) Comprehensiveness of Linguistic Data Summaries: A Crucial Role of Protoforms. In: Christian Moewes and Andreas Nürnberger (Eds.): Computational Intelligence in Intelligent Data Analysis. Springer-Verlag, Berlin, Heidelberg, 207–221.

    Chapter  Google Scholar 

  16. Kaufman, L., Rousseeuw, P.J. (1990) Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York.

    Google Scholar 

  17. Lance, G.N.,Williams. W.T. (1966) A General Theory of Classificatory Sorting Strategies. 1. Hierarchical systems. Computer Journal, 9, 373–380.

    Google Scholar 

  18. Maechler, Martin, et al. (2015) “cluster: cluster analysis extended Rousseeuw et al.” R package, version 2.0.3.

    Google Scholar 

  19. Michalski, R. (1983) A theory and methodology of inductive learning. Artificial Intelligence: 20(2), 111–161.

    Google Scholar 

  20. Miyamoto, S. (2014) Classification Rules in Methods of Clustering (featured article). IEEE Intelligent Informatics Bulletin, 15(1), 15–21.

    MathSciNet  Google Scholar 

  21. Miyamoto, S., Ichihashi, H., Honda, K. (2008) Algorithms for Fuzzy Clustering: Methods in c-Means Clustering with Applications. Springer-Verlag, Berlin Heidelberg, Studies in Fuzziness and Soft Computing 229.

    MATH  Google Scholar 

  22. Mullen, K.M, Ardia, D., Gil, D., Windover, D., Cline, J. (2011) DEoptim: An R Package for Global Optimization by Differential Evolution. Journal of Statistical Software, 40(6), 1–26.

    Article  Google Scholar 

  23. Pryke, A., Beale, R. (2004) Interactive Comprehensible Data Mining. In: Y. Cai (ed.): Ambient Intelligence for Scientific Discovery, Springer, LNCS 3345, 48–65.

    Google Scholar 

  24. R Core Team (2014) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.

  25. Ríos, S.A., Velásquez J.D. (2011) Finding Representative Web Pages Based on a SOM and a Reverse Cluster Analysis. International Journal on Artificial Intelligence Tools 20(1) 93–118.

    Google Scholar 

  26. Stańczak, J. (2003) Biologically inspired methods for control of evolutionary algorithms. Control and Cybernetics, 32 (2), 411–433.

    MATH  Google Scholar 

  27. Storn, R., Price, K. (1997) Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization, 11(4), 341–359.

    Article  MathSciNet  MATH  Google Scholar 

  28. Torra, V., Endo, Y., Miyamoto, S. (2011) Computationally intensive parameter selection for clustering algorithms: The case of fuzzy \(c\)-means with tolerance. International Journal of Intelligent Systems, 26 (4), 313–322.

    Article  Google Scholar 

  29. Zhou, Z.H. (2005) Comprehensibility of data mining algorithms. In: J. Wang (ed.): Encyclopedia of Data Warehousing and Mining, IGI Global: Hershey, 190–195.

    Google Scholar 

Download references

Acknowledgements

Partially supported by the National Science Centre under Grant UMO-2012/05/B/ST6/03068.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sławomir Zadrożny .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Owsiński, J.W., Kacprzyk, J., Opara, K., Stańczak, J., Zadrożny, S. (2017). Using a Reverse Engineering Type Paradigm in Clustering. An Evolutionary Programming Based Approach. In: Torra, V., Dahlbom, A., Narukawa, Y. (eds) Fuzzy Sets, Rough Sets, Multisets and Clustering. Studies in Computational Intelligence, vol 671. Springer, Cham. https://doi.org/10.1007/978-3-319-47557-8_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-47557-8_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-47556-1

  • Online ISBN: 978-3-319-47557-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics