What Can Fuzzy Cluster Analysis Contribute to Clustering of High-Dimensional Data?

Klawonn, Frank

doi:10.1007/978-3-319-03200-9_1

What Can Fuzzy Cluster Analysis Contribute to Clustering of High-Dimensional Data?

Frank Klawonn^22,23

Conference paper

1697 Accesses
7 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8256))

Abstract

Cluster analysis of high-dimensional data has become of special interest in recent years. The term high-dimensional data can refer to a larger number of attributes – 20 or more – as they often occur in database tables. But high-dimensional data can also mean that we have to deal with thousands of attributes as in the context of genomics or proteomics data where thousands of genes or proteins are measured and are considered in some analysis tasks as attributes.

A main reason, why cluster analysis of high-dimensional data is different from clustering low-dimensional data, is the concentration of norm phenomenon, which states more or less that the relative differences between distances between randomly distributed points tend to be more and more similar in higher dimensions.

On the one hand, fuzzy cluster analysis has been shown to be less sensitive to initialisation than, for instance, the classical k-means algorithm. On the other, standard fuzzy clustering is stronger affected by the concentration of norm phenomenon and tends to fail easily in high dimensions. Here we present a review of why fuzzy clustering has special problems with high-dimensional data and how this can be amended by modifying the fuzzifier concept. We also describe a recently introduced approach based on correlation and an attribute selection fuzzy clustering technique that can be applied when clusters can only be found in lower dimensions.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Berthold, M., Borgelt, C., Höppner, F., Klawonn, F.: Guide to Intelligent Data Analysis: How to Intelligently Make Sense of Real Data. Springer, London (2010)
Book Google Scholar
Kerr, G., Ruskin, H., Crane, M.: Techniques for clustering gene expression data. Computers in Biology and Medicine 38(3), 383–393 (2008)
Article Google Scholar
Pommerenke, C., Müsken, M., Becker, T., Dötsch, A., Klawonn, F., Häussler, S.: Global genotype-phenotype correlations in pseudomonas aeruginosa. PLoS Pathogenes 6(8) (2010), doi:10.1371/journal.ppat.1001074
Google Scholar
Klawonn, F., Höppner, F., Jayaram, B.: What are clusters in high dimensions and are they difficult to find? In: Proc. CHDD 2013, Springer, Berlin (to appear, 2013)
Google Scholar
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)
Chapter Google Scholar
Durrant, R.J., Kabán, A.: When is ’nearest neighbour’ meaningful: A converse theorem and implications. J. Complexity 25(4), 385–397 (2009)
Article MathSciNet MATH Google Scholar
François, D., Wertz, V., Verleysen, M.: The concentration of fractional distances. IEEE Trans. Knowl. Data Eng. 19(7), 873–886 (2007)
Article Google Scholar
Jayaram, B., Klawonn, F.: Can unbounded distance measures mitigate the curse of dimensionality? Int. Journ. Data Mining, Modelling and Management 4, 361–383 (2012)
Article Google Scholar
Aggarwal, C.C.: Re-designing distance functions and distance-based applications for high dimensional data. SIGMOD Record 30(1), 13–18 (2001)
Article Google Scholar
Hsu, C.M., Chen, M.S.: On the design and applicability of distance functions in high-dimensional data space. IEEE Trans. Knowl. Data Eng. 21(4), 523–536 (2009)
Article MathSciNet Google Scholar
Domeniconi, C., Papadopoulos, D., Gunopulos, D.: Subspace clustering of high dimensional data. In: Proceedings of SIAM Conference on Data Mining 2004, pp. 517–521 (2004)
Google Scholar
Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: A review. ACM SIGKDD Explorations Newsletter 6(1), 90–105 (2004)
Article Google Scholar
Kriegel, H.P., Kröger, P., Zimek, A.: Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. Knowl. Discov. Data 3(1), 1–58 (2009)
Article Google Scholar
Keller, A., Klawonn, F.: Fuzzy clustering with weighting of data variables. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 8, 735–746 (2000)
MATH Google Scholar
Madeira, S., Oliveira, A.: Biclustering algorithms for biological data analysis: A survey. IEEE Trans. Comput. Biol. Bioinf. 1(1), 24–45 (2004)
Article Google Scholar
Tanay, A., Sharan, R., Shamir, R.: Biclustering algorithms: A survey. In: Aluru, S. (ed.) Handbook of Computational Molecular Biology. Chapman and Hall, Boca Raton (2006)
Google Scholar
Van Mechelen, I., Bock, H.H., De Boeck, P.: Two-mode clustering methods: a structured overview. Statistical Methods in Medical Research 13, 363–394 (2004)
Article MathSciNet MATH Google Scholar
Dunn, J.: A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. Cybernetics and Systems 3(3), 32–57 (1973)
Article MathSciNet MATH Google Scholar
Bezdek, J.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)
Book MATH Google Scholar
Duda, R., Hart, P.: Pattern Classification and Scene Analysis. Wiley, New York (1973)
MATH Google Scholar
Jayaram, B., Klawonn, F.: Can fuzzy clustering avoid local minima and undesired partitions? In: Moewes, C., Nürnberger, A. (eds.) Computational Intelligence in Intelligent Data Analysis. SCI, vol. 445, pp. 31–44. Springer, Heidelberg (2013)
Chapter Google Scholar
Klawonn, F., Höppner, F.: What is fuzzy about fuzzy clustering? Understanding and improving the concept of the fuzzifier. In: Berthold, M.R., Lenz, H.J., Bradley, E., Kruse, R., Borgelt, C. (eds.) IDA 2003. LNCS, vol. 2810, pp. 254–264. Springer, Heidelberg (2003)
Chapter Google Scholar
Gustafson, D., Kessel, W.: Fuzzy clustering with a fuzzy covariance matrix. In: IEEE CDC, San Diego, pp. 761–766 (1979)
Google Scholar
Keller, A., Klawonn, F.: Adaptation of cluster sizes in objective function based fuzzy clustering. In: Leondes, C. (ed.) Intelligent Systems: Technology and Applications. Database and Learning Systems, vol. IV, pp. 181–199. CRC Press, Boca Raton (2003)
Google Scholar
Bezdek, J., Keller, J., Krishnapuram, R., Pal, N.: Fuzzy Models and Algorithms for Pattern Recognition and Image Processing. Kluwer, Boston (1999)
MATH Google Scholar
Höppner, F., Klawonn, F., Kruse, R., Runkler, T.: Fuzzy Cluster Analysis. Wiley, Chichester (1999)
MATH Google Scholar
Winkler, R., Klawonn, F., Kruse, R.: Fuzzy c-means in high dimensional spaces. Fuzzy System Applications 1, 1–17 (2011)
Article Google Scholar
Höppner, F., Klawonn, F.: A contribution to convergence theory of fuzzy c-means and its derivatives. IEEE Transactions on Fuzzy Systems 11, 682–694 (2003)
Article Google Scholar
Krone, M., Klawonn, F., Jayaram, B.: RaCoCl: Robust rank correlation based clustering – an exploratory study for high-dimensional data. In: FUZZ-IEEE 2013, Hyderabad (2013)
Google Scholar
Bodenhofer, U., Klawonn, F.: Robust rank correlation coefficients on the basis of fuzzy orderings: Initial steps. Mathware and Soft Computing 15, 5–20 (2008)
MathSciNet MATH Google Scholar
Bodenhofer, U., Krone, M., Klawonn, F.: Testing noisy numerical data for monotonic association. Information Sciences 245, 21–37 (2013)
Article Google Scholar
Krishnapuram, R., Freg, C.: Fitting an unknown number of lines and planes to image data through compatible cluster merging. Pattern Recognition 25, 385–400 (1992)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Bioinformatics & Statistics, Helmholtz-Centre for Infection Research, Inhoffenstr. 7, D-38124, Braunschweig, Germany
Frank Klawonn
Department of Computer Science, Ostfalia University of Applied Sciences, Salzdahlumer Str. 46/48, D-38302, Wolfenbuettel, Germany
Frank Klawonn

Authors

Frank Klawonn
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

DIBRIS, University of Genoa, Via Dodecaneso 35, 16146, Genoa, Italy
Francesco Masulli
Dept. of Informatics, Systems, and Communication, University of Milano Bicocca, Viale Sarca 336, 20126, Milan, Italy
Gabriella Pasi
Dept. of Information Systems, Iona College, 710 North Ave, 10801, New Rochelle, NY, USA
Ronald Yager

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Klawonn, F. (2013). What Can Fuzzy Cluster Analysis Contribute to Clustering of High-Dimensional Data?. In: Masulli, F., Pasi, G., Yager, R. (eds) Fuzzy Logic and Applications. WILF 2013. Lecture Notes in Computer Science(), vol 8256. Springer, Cham. https://doi.org/10.1007/978-3-319-03200-9_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-03200-9_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03199-6
Online ISBN: 978-3-319-03200-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics