Abstract
The medoid-based clustering algorithm, Partition Around Medoids (PAM), is better than the centroid-based k-means because of its robustness to noisy data and outliers. PAM cannot recognize relatively small clusters in situations where good partitions around medoids clearly exist. Also PAM needs O(k(n-k)2) operations to cluster a given dataset, which is computationally prohibited for large n and k. In this paper, we propose a new bisecting k-medoids algorithm that is capable of grouping the co-expressed genes together with better clustering quality and time performances. The proposed algorithm is evaluated over three gene expression datasets in which noise components are involved. The proposed algorithm takes less computation time with comparable performance relative to the Partitioning Around Medoids algorithm.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Hartigan, J.: Clustering Algorithms, Wiley Series in Probability and Mathematical Statistics (1975)
Jain, A., Murty, M., Flynn, P.: Data Clustering: A Review. ACM computing surveys 31, 264–323 (1999)
Ben-Dor, A., Shamir, R., Yakhini, Z.: Clustering gene expression patterns. Journal of Computational Biology 6(3/4), 281–297 (1999)
Shamir, R., Sharan, R.: Algorithmic Approaches to Clustering Gene Expression Data. In: Current Topics in Computational Biology, pp. 269–299. MIT Press, Cambridge (2002)
Hartuv, E., Shamir, R.: A clustering algorithm based on graph connectivity. Information Processing Letters 76(200), 175–181 (2000)
Hartigan, J., Wong, M.: A k-means Clustering Algorithm. Applied Statistics 28, 100–108 (1979)
Bezdek, J., Ehrlich, R., Full, W.: The Fuzzy C-Means Clustering Algorithm. Computers and Geosciences 10, 191–203 (1984)
Savaresi, S., Boley, D.: On the Performance of Bisecting K-means and PDDP. In: Proc. of the 1st SIAM Int. Conf. on Data Mining, pp. 1–14 (2001)
Yousri, N.A., Ismail, M.A., Kamel, M.S.: Discovering Connected Patterns in Gene Expression Arrays. In: IEEE Symposium on Computational intelligence and Bioinformatics and Computational Biology (CIBCB), pp. 113–120 (2007)
Lu, Y., Lu, S., Fotouhi, F., Deng, Y., Brown, S.: Incremental Genetic K-means Algorithm and its Application in Gene Expression Data Analysis. BMC Bioinformatics 5(172) (2004)
Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Downing, J., Caligiuri, M.A., Bloomfield, C., Lander, E.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286(5439), 531–537 (1999)
Monti, S., Tamayo, P., Mesirov, J., Golub, T.: Consensus Clustering: A resampling-based method for class discovery and visualization of gene expression microarray data. Kluwer Academic Publishers, Dordrecht (2003)
Kaufmann, L., Rousseeuw, P.: Finding groups in data. Wiley, Chichester (1990)
Kustra, R., Zaganski, A.: Incorporating Gene Ontology in Clustering Gene Expression Data. In: Proceedings of the 19th IEEE Symposium on Computer-Based Medical Systems, pp. 555–563 (2006)
Ng, R.T., Han, J.: Efficient and effective clustering methods for spatial data mining. In: VLDB, pp. 144–155 (1994)
Hammouda, K., Kamel, M.: Collaborative Document Clustering. In: SIAM Conference on Data Mining (SDM 2006), pp. 453–463 (2006)
Zhao, Y., Karypis, G.: Criterion Functions for Document Clustering: Experiments and Analysis, Technical report, Department of Computer Science,University of Minnesota, Minneapolis, MN (2002)
Bensaid, A., Hall, L.O., Bezdek, J., Clarke, L., Silbiger, M., Arrington, J., Murtagh, R.: Validity-guided (Re)Clustering with applications to imige segmentation. IEEE Transactions on Fuzzy Systems, 112–123 (1996)
Tavazoie, S., Hughes, J., Campbell, M., Cho, R., Church, G.: Systematic determination of genetic network architecture. Nature Genetics 22, 281–285 (1999)
West, M., Blanchette, C., Dressman, H., Huang, E., Ishida, S., Spang, R., Zuzan, H., Olson, J., Marks, J., Nevins, J.: Predicting the clinical status of human breast cancer by using gene expression profiles. Proc. Natl. Acad. Sci. USA 98, 11462–11467 (2001)
Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E.S., Golub, T.: Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. PNAS 96, 2907–2912 (1999)
Eisen, M., Spellman, P., Brown, P., Botstein, D.: Cluster Analysis and Display of Genome-wide Expression Patterns. Proceedings of the National Academy of Sciences of the United States of America 95(25), 14863–14868 (1998)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kashef, R., Kamel, M.S. (2008). Efficient Bisecting k-Medoids and Its Application in Gene Expression Analysis. In: Campilho, A., Kamel, M. (eds) Image Analysis and Recognition. ICIAR 2008. Lecture Notes in Computer Science, vol 5112. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69812-8_42
Download citation
DOI: https://doi.org/10.1007/978-3-540-69812-8_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69811-1
Online ISBN: 978-3-540-69812-8
eBook Packages: Computer ScienceComputer Science (R0)