Abstract
In this paper, we introduce an approach to integrate prior knowledge in cluster analysis, which is different from the existing ones for semi-supervised clustering methods. In order to aid the discovery of alternative structures present in the data, we consider the knowledge of some existing complete classification of such data. The approach proposed is based on our Multi-Objective Clustering Ensemble algorithm (MOCLE). This algorithm generates a concise and stable set of partitions, which represents different trade-offs between several measures of partition quality. The prior knowledge is automatically integrated in MOCLE by embedding it into one of the objective functions. In this case, the function gives as output the quality of a partition, considering the prior knowledge of one of the known structures of the data.
This work was supported by FAPESP and CNPq.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Narayanan, E.K.A.: AIntelligent Bioinformatics: The Application of Artificial Intelligence Techniques to Bioinformatics Problems. John Wiley & Sons, Chichester (2005)
Wang, J.T.L., Zaki, M.J., Toivonen, H.T.T., Shasha, D.E. (eds.): Data Mining in Bioinformatics. Advanced Information and Knowledge Processing. Springer, Heidelberg (2003)
Yeoh, E.J., et al.: Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1(2), 133–143 (2002)
Golub, T., et al.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Alizadeh, A., et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(6769), 503–511 (2000)
Demiriz, A., Bennett, K.P., Embrechts, M.J.: Semi-supervised clustering using genetic algorithms. In: Artificial Neural Networks in Engineering (ANNIE’1999), pp. 809–814 (1999)
Handl, J., Knowles, J.: On semi-supervised clustering via multiobjective optimization. In (GECCO’2006). Proceedings of the 8th annual conference on Genetic and evolutionary computation, pp. 1465–1472. ACM Press, New York, NY, USA (2006)
Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Transactions on Neural Networks 16(3), 645–678 (2005)
Law, M., Topchy, A., Jain, A.K.: Multiobjective data clustering. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 424–430. IEEE Computer Society Press, Los Alamitos (2004)
Jain, A., Dubes, R.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)
Handl, J., Knowles, J., Kell, D.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21(15), 3201–3212 (2005)
Handl, J., Knowles, J.: An evolutionary approach to multiobjective clustering. IEEE Transactions on Evolutionary Computation 11(1), 56–76 (2007)
Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal on Machine Learning Research 3, 583–617 (2002)
Faceli, K., Carvalho, A., Souto, M.: Multi-objective clustering ensemble. In (HIS’2006). Proceedings of the 6th International Conference on Hybrid Intelligent Systems, Auckland, New Zealand, p. 51. IEEE Computer Society Press, Los Alamitos (2006)
Breiman, L.: Technical note: some properties of splitting criteria. Machine Learning 24(1), 41–47 (1996)
Deb, K., Pratap, A., Agarwal, S., Meyrivan, T.: A fast and elitist multi-objective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6(2), 182–197 (2002)
Fern, X.Z., Brodley, C.E.: Solving cluster ensemble problems by bipartite graph partitioning. In (ICML’2004). Proceedings of the Twenty-first International Conference on Machine Learning, p. 36. ACM Press, New York (2004)
Raileanu, L.E., Stoffel, K.: Theoretical comparison between the Gini index and information gain criteria. Annals of Mathematics and Artiticial Intelligence 1(41), 77–93 (2004)
Ertöz, L., Steinbach, M., Kumar, V.: A new shared nearest neighbor clustering algorithm and its applications. In: Proceedings of the Workshop on Clustering High Dimensional Data and its Applications. 2nd SIAM International Conference on Data Mining (SDM’2002), pp. 105–115 (2002)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. JMLR 7, 1–30 (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Faceli, K., de Carvalho, A.C.P.L.F., de Souto, M.C.P. (2007). Multi-Objective Clustering Ensemble with Prior Knowledge. In: Sagot, MF., Walter, M.E.M.T. (eds) Advances in Bioinformatics and Computational Biology. BSB 2007. Lecture Notes in Computer Science(), vol 4643. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73731-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-540-73731-5_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73730-8
Online ISBN: 978-3-540-73731-5
eBook Packages: Computer ScienceComputer Science (R0)