Abstract
In the presence of huge high dimensional datasets, it is important to investigate and visualize the connectivity of patterns in huge arbitrary shaped clusters. While density or distance-relatedness based clustering algorithms are used to efficiently discover clusters of arbitrary shapes and densities, classical (yet less efficient) clustering algorithms can be used to analyze the internal cluster structure and visualize it. In this work, a sequential ensemble, that uses an efficient distance-relatedness based clustering, “Mitosis”, followed by the centre-based K-means algorithm, is proposed. K-means is used to segment the clusters obtained by Mitosis into a number of subclusters. The ensemble is used to reveal the gradual change of patterns when applied to gene expression sets.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Strehl, A., Ghosh, J.: Cluster Ensembles- a knowledge reuse framework for combining multiple partitions. Journal of Machine learning Research 3, 583–617 (2002)
Fred, A.L.N.: Finding Consistent Clusters in Data Partitions. In: Kittler, J., Roli, F. (eds.) MCS 2001. LNCS, vol. 2096, pp. 309–318. Springer, Heidelberg (2001)
Topchy, A., Jain, A.K., Punch, W.: Combining Multiple Weak Clusterings. In: Proc. IEEE Intl. Conf. on Data Mining, pp. 331–338 (2003)
Fred, A.L.N., Jain, A.K.: Data clustering using evidence accumulation. In: Proceedings of International Conference on Pattern Recognition (2002)
Hadjitodorov, S.T., Kuncheva, L.I., Todor-ova, L.P.: Moderate diversity for better cluster ensembles. Information Fusion 7(3), 264–275 (2006)
Monti, S., Tamayo, P., Mesirov, J., Golub, T.: Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learning 52, 91–118 (2003)
Alter, O., Patrick, O.B., Botstein, D.: Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl. Acad. Sci. USA 97(18) (August 2000)
Yousri, N.A., Kamel, M.S., Ismail, M.A.: A Distance-Relatedness Dynamic Model for Clustering High Dimensional Data of Arbitrary Shapes and Densities. Journal of Pattern Recognition (July 2009)
Yousri, N.A.: Novel Methodologies for Discovering Clusters of Arbitrary Shapes and Densities in High Dimensional Data, with Applications. Ph.D.Thesis, Computers and Systems Engineering, Alexandria University, Egypt (June 2008)
Hartigan, J.A.: Clustering Algorithms. Wiley Series in Probability & Mathematical Statistics (1975)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial data sets with noise. In: Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, pp. 226–231
Hinneburg, A., Keim, D.: An efficient approach to clustering in large multimedia data sets with noise. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, pp. 58–65 (1998)
Karypis, G., Han, E.H., Kumar, V.: CHAMELEON: A hierarchical clustering algorithm using dynamic modeling. Computer 32(8), 68–75 (1999)
Guha, S., Rastogi, R., Shim, K.: ROCK: A Robust Clustering Algorithm for Categorical Attributes. In: Proceedings of the IEEE Conference on Data Engineering (1999)
Yousri, N.A., Ismail, M.A., Kamel, M.S.: Discovering Connected Patterns in Gene Expression Arrays. In: IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), Hawaii, USA (2007)
Yousri, N.A., Kamel, M.S., Ismail, M.A.: Pattern Cores and Connectedness in Cancer Gene Expression. In: 7th IEEE International Conference on BioInformatics and BioEngineering (BIBE), Boston, USA (October 2007)
Yousri, N.A., Kamel, M.S., Ismail, M.A.: A Fuzzy Approach for Analyzing Outliers in Gene Expression Data. In: BMEI 2008, Haikou, China (2008)
Yousri, N.A., Kamel, M.S., Ismail, M.A.: A Novel Validity Measure for Clusters of Arbitrary Shapes and Densities. In: International Conference of Pattern Recognition, ICPR 2008 (2008)
Dunn, J.C.: Well separated clusters and optimal fuzzy partitions. J. Cybern. 4, 95–104 (1974)
West, M., et al.: Predicting the clinical status of human breast cancer by using gene expression profiles. PNAS 98(20), 11462–11467 (2001)
Bertoni, A., Valentini, G.: Discovering multi–level structures in bio-molecular data through the Bernstein inequality. BMC Bioinformatics 9(suppl. 2), S4 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yousri, N.A. (2010). A Multi-objective Sequential Ensemble for Cluster Structure Analysis and Visualization and Application to Gene Expression. In: El Gayar, N., Kittler, J., Roli, F. (eds) Multiple Classifier Systems. MCS 2010. Lecture Notes in Computer Science, vol 5997. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12127-2_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-12127-2_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12126-5
Online ISBN: 978-3-642-12127-2
eBook Packages: Computer ScienceComputer Science (R0)