Abstract
Parametric Embedding (PE) has recently been proposed as a general-purpose algorithm for class visualisation. It takes class posteriors produced by a mixture-based clustering algorithm and projects them in 2D for visualisation. However, although this fully modularised combination of objectives (clustering and projection) is attractive for its conceptual simplicity, in the case of high dimensional data, we show that a more optimal combination of these objectives can be achieved by integrating them both into a consistent probabilistic model. In this way, the projection step will fulfil a role of regularisation, guarding against the curse of dimensionality. As a result, the tradeoff between clustering and visualisation turns out to enhance the predictive abilities of the overall model. We present results on both synthetic data and two real-world high-dimensional data sets: observed spectra of early-type galaxies and gene expression arrays.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., Levine, A.: Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumour and Normal Colon Cancer Tissues Probed by Oligonucleotide Arrays. Cell Biol. 96, 6745–6750
Attias, H.: Learning in High Dimension: Modular mixture models. In: Proc. Artificial Intelligence and Statistics (2001)
Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Inc., New York (1995)
Bishop, C.M., Svensen, M., Williams, C.K.I.: GTM: The Generative Topographic Mapping. Neural Computation 10(1) (1998)
Carlin, B.P., Louis, T.A.: Bayes and Empirical Bayes Methods for Data Analysis. Chapman and Hall, Boca Raton (2000)
Hofmann, T.: Gaussian Latent Semantic Models for Collaborative Filtering. In: 26th Annual International ACM SIGIR Conference (2003)
Iwata, T., Saito, K., Ueda, N., Stromsten, S., Griffiths, T.L., Tenenbaum, J.B.: Parameteric Embedding for Class Visualisation. In: Proc. Neur. Information Processing Systems, p. 17 (2005)
Kabán, A., Nolan, L., Raychaudhury, S.: Finding Young Stellar Populations in Elliptical Galaxies from Independent Components of Optical Spectra. In: Jonker, W., Petković, M. (eds.) SDM 2005. LNCS, vol. 3674, pp. 183–194. Springer, Heidelberg (2005)
Nolan, L., Harva, M., Kabán, A., Raychaudhury, S.: A data-driven Bayesian approach to finding young stellar populations in early-type galaxies from their ultraviolet-optical spectra. Mon. Not. of the Royal Astron. Soc. 366, 321–338 (2006)
Nolan, L., Dunlop, J.S., Panter, B., Jimenez, R., Heavens, A., Smith, G.: The star-formation histories of elliptical galaxies across the fundamental plane (submitted to MNRAS)
Rice, J.: Reflections on SCMA III. In: Feigelson, E.C., Babu, G.J. (eds.) Statistical challenges in astronomy. Springer, Heidelberg (2003)
Rogers, S., Girolami, M., Campbell, C., Breitling, R.: The latent process decomposition of cDNA microarray datasets. IEEE/ACM Transact. Comput. Biol. Bioinformatics 2, 143–156
Soukup, T., Davidson, I.: Visual Data Mining: Techniques and Tools for Data Visualisation and Mining. Wiley, Chichester (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kabán, A., Sun, J., Raychaudhury, S., Nolan, L. (2006). On Class Visualisation for High Dimensional Data: Exploring Scientific Data Sets. In: Todorovski, L., Lavrač, N., Jantke, K.P. (eds) Discovery Science. DS 2006. Lecture Notes in Computer Science(), vol 4265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11893318_15
Download citation
DOI: https://doi.org/10.1007/11893318_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46491-4
Online ISBN: 978-3-540-46493-8
eBook Packages: Computer ScienceComputer Science (R0)