Abstract
Semi-supervised methods use a small amount of labeled data as a guide to unsupervised techniques. Recent literature shows better performance of these methods with respect to totally unsupervised ones even with a small amount of side-information This fact suggests that the use of semi-supervised methods may be useful especially in very difficult and noisy tasks where little a priori information is available. This is the case of biological datasets’ classification. The two more frequently used paradigms to include side-information into clustering are Constrained Clustering and Metric Learning. In this paper we use a Metric Learning approach as a preliminary step to fuzzy clustering and we show that Semi-Supervised Fuzzy Clustering (SSFC) can be an effective tool for classification of biological datasets. We used three real biological datasets and a generalized version of the Partition Entropy index to validate our results. In all cases tested the metric learning step produced a better highlight of the datasets’ clustering structure.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bar-Hillel, A., Hertz, T., Shental, N., Weinshall, D.: Learning a Mahalanobis Metric from Equivalence Constraints. Journal of Machine Learning Research 6, 937–965 (2005)
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)
Bilenko, M., Basu, S., Mooney, R.J.: Integrating Constraints and Metric Learning in Semi-Supervised Clustering. In: Proceedings of the 21st ICML, pp. 81–88 (2004)
Chang, H., Yeung, D.Y.: Locally Linear Metric Adaptation for Semi-Supervised Clustering. In: Proceedings of the 21st ICML, pp. 153–160 (2004)
Cho, R.J., Campbell, M.J., Winzeler, E.A., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T.G., Gabrielian, A.E., Landsman, D., Lockhart, D.J., Davis, R.W.: A Genome-Wide Transcriptional Analysis of the Mitotic Cell Cycle. Molecular Cell 2, 65–73 (1998)
Chu, S., DeRisi, J., Eisen, M., Mulholland, J., Botstein, D., Brown, P.O., Herskowitz, I.: The Transcriptional Program of Sporulation in Budding Yeast. Science 282, 699–705 (1998)
De Bie, T., Momma, M., Cristianini, N.: Efficiently learning the metric with side-information. In: Gavaldá, R., Jantke, K.P., Takimoto, E. (eds.) ALT 2003. LNCS (LNAI), vol. 2842, pp. 175–189. Springer, Heidelberg (2003)
Demb, D., Kastner, P.: Fuzzy C-means Method for Clustering Microarray Data. Bioinformatics 19, 973–980 (2003)
Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster Analysis and Display of Genome-Wide Expression Patterns. PNAS 95, 14863–14868 (1998)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537 (1999)
Kirkpatrick, S., Gelatt Jr, C.D., Vecchi, M.P.: Optimization by Simulated Annealing. Science 220, 671–680 (1983)
Shental, N., Bar-Hillel, A., Hertz, T., Weinshall, D.: Computing Gaussian Mixture Models with EM using Equivalence Constraints. In: Proceedings of Neural Information Processing Systems 2003, vol. 16 (2003)
Schultz, M., Joachims, T.: Learning a Distance Metric From Relative Comparisons. In: Proceedings of Neural Information Processing Systems 2003, vol. 16 (2003)
Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O., Botstein, D., Futcher, B.: Comprehensive Identification of Cell Cycle-Regulated Genes of the Yeast Saccaromyces Cervesiae by Microarray Hybridization. Molecular Biology of the Cell 9, 3273–3297 (1998)
Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance Metric Learning, With Application to Clustering With Side-Information. Advances in Neural Information Processing Systems 15 (2002)
Zhengdong, L., Leen, T.: Semi-supervised Learning with Penalized Probabilistic Clustering. In: Proceedings of Neural Information Processing Systems 2004, vol. 17 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ceccarelli, M., Maratea, A. (2006). Semi-supervised Fuzzy c-Means Clustering of Biological Data. In: Bloch, I., Petrosino, A., Tettamanzi, A.G.B. (eds) Fuzzy Logic and Applications. WILF 2005. Lecture Notes in Computer Science(), vol 3849. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11676935_32
Download citation
DOI: https://doi.org/10.1007/11676935_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32529-1
Online ISBN: 978-3-540-32530-7
eBook Packages: Computer ScienceComputer Science (R0)