Abstract
In this paper, we propose a novel Bayesian nonparametric statistical approach of simultaneous clustering and localized feature selection for unsupervised learning. The proposed model is based on a mixture of Dirichlet processes with generalized Dirichlet (GD) distributions, which can also be seen as an infinite GD mixture model. Due to the nature of Bayesian nonparametric approach, the problems of overfitting and underfitting are prevented. Moreover, the determination of the number of clusters is sidestepped by assuming an infinite number of clusters. In our approach, the model parameters and the local feature saliency are estimated simultaneously by variational inference. We report experimental results of applying our model to two challenging clustering problems involving web pages and tissue samples which contain gene expressions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alizadeh, A.A., Eisen, M.B., Davis, R.E., et al.: Distinct Types of Diffuse Large B-cell Lymphoma Identified by Gene Expression Profiling. Nature 403, 503–511 (2000)
Attias, H.: A Variational Bayes Framework for Graphical Models. In: Proc. of Neural Information Processing Systems (NIPS), pp. 209–215 (1999)
Bishop, C.M.: Variational Learning in Graphical Models and Neural Networks. In: Proc. of ICANN, pp. 13–22. Springer (1998)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Blei, D.M., Jordan, M.I.: Variational Inference for Dirichlet Process Mixtures. Bayesian Analysis 1, 121–144 (2005)
Bouguila, N., Ziou, D.: A Hybrid SEM Algorithm for High-Dimensional Unsupervised Learning Using a Finite Generalized Dirichlet Mixture. IEEE Transactions on Image Processing 15(9), 2657–2668 (2006)
Bouguila, N., Ziou, D.: High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture Model Based on Minimum Message Length. IEEE Transactions on PAMI 29(10), 1716–1731 (2007)
Boutemedjet, S., Bouguila, N., Ziou, D.: A Hybrid Feature Extraction Selection Approach for High-Dimensional Non-Gaussian Data Clustering. IEEE Transactions on PAMI 31(8), 1429–1443 (2009)
Constantinopoulos, C., Titsias, M., Likas, A.: Bayesian Feature and Model Selection for Gaussian Mixture Models. IEEE Trans. on PAMI 28(6), 1013–1018 (2006)
Fan, W., Bouguila, N., Ziou, D.: Unsupervised Anomaly Intrusion Detection via Localized BayesianFeature Selection. In: Proc. of ICDM, pp. 1032–1037 (2011)
Fan, W., Bouguila, N., Ziou, D.: Variational Learning for Finite Dirichlet Mixture Models and Applications. IEEE Trans. Neural Netw. Learning Syst. 23(5), 762–774 (2012)
Ferguson, T.S.: Bayesian Density Estimation by Mixtures of Normal Distributions. Recent Advances in Statistics 24, 287–302 (1983)
Figueiredo, M., Jain, A.: Unsupervised Learning of Finite Mixture Models. IEEE Transactions on PAMI 24(3), 381–396 (2002)
Ji, Y., Wu, C., Liu, P., Wang, J., Coombes, K.R.: Applications of Beta-mixture Models in Bioinformatics. Bioinformatics 21(9), 2118–2122 (2005)
Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., Saul, L.K.: An Introduction to Variational Methods for Graphical Models. Machine Learning 37(2), 183–233 (1999)
Law, M.H.C., Figueiredo, M.A.T., Jain, A.K.: Simultaneous Feature Selection and Clustering Using Mixture Models. IEEE Trans. on PAMI 26(9), 1154–1166 (2004)
Li, Y., Dong, M., Hua, J.: Simultaneous Localized Feature Selection and Model Detection for Gaussian Mixtures. IEEE Transactions on PAMI 31, 953–960 (2009)
Ma, Z., Leijon, A.: Bayesian Estimation of Beta Mixture Models with Variational Inference. IEEE Transactions on PAMI 33(11), 2160–2173 (2011)
McLachlan, G.J., Khan, N.: On a Resampling Approach for Tests on the Number of Clusters with Mixture Model-based Clustering of Tissue Samples. J. Multivar. Anal. 90(1), 90–105 (2004)
Neal, R.M.: Markov Chain Sampling Methods for Dirichlet Process Mixture Models. Journal of Computational and Graphical Statistics 9(2), 249–265 (2000)
Sethuraman, J.: A Constructive Definition of Dirichlet Priors. Statistica Sinica 4, 639–650 (1994)
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet Processes. Journal of the American Statistical Association 101, 705–711 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fan, W., Bouguila, N. (2012). Nonparametric Localized Feature Selection via a Dirichlet Process Mixture of Generalized Dirichlet Distributions. In: Huang, T., Zeng, Z., Li, C., Leung, C.S. (eds) Neural Information Processing. ICONIP 2012. Lecture Notes in Computer Science, vol 7665. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34487-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-34487-9_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34486-2
Online ISBN: 978-3-642-34487-9
eBook Packages: Computer ScienceComputer Science (R0)