Regularized Gaussian Mixture Model based discretization for gene expression data association mining

Cai, Ruichu; Hao, Zhifeng; Wen, Wen; Wang, Lijuan

doi:10.1007/s10489-013-0435-7

Regularized Gaussian Mixture Model based discretization for gene expression data association mining

Published: 02 April 2013

Volume 39, pages 607–613, (2013)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Ruichu Cai^1,2,
Zhifeng Hao¹,
Wen Wen¹ &
…
Lijuan Wang¹

577 Accesses
5 Citations
Explore all metrics

Abstract

Association rule has shown its usefulness in the gene expression data based disease diagnosis for its good interpretability. The large number of rules generated from the high dimensional gene expression data is one of the main challenges of its applications. In this work, we reveal that the discretization preprocessing is one of the reasons for the association rule number explosion problem. To alleviate this problem, a Regularized Gaussian Mixture Model (RGMM) is proposed to discretize the continuous gene expression data. RGMM explores both the complexity of the discretization model and the information loss of the discretization procedure, under the Minimal Description Length framework. Extensive experiments show the effectiveness of RGMM on real-life gene expression data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new clustering method of gene expression data based on multivariate Gaussian mixture models

Article 08 February 2015

Graph clustering-based discretization approach to microarray data

Article 05 September 2018

Learning Gene Regulatory Networks with High-Dimensional Heterogeneous Data

References

Ahmed N, Gokhale D (1989) Entropy expressions and their estimators for multivariate distributions. IEEE Trans Inf Theory 35(3):688–692
Article MathSciNet MATH Google Scholar
Alcalá-Fdez J, Alcala R, Herrera F (2011) A fuzzy association rule-based classification model for high-dimensional problems with genetic rule selection and lateral tuning. IEEE Trans Fuzzy Syst 19(5):857–872
Article Google Scholar
Alon U, Barka N et al (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96(12):5
Article Google Scholar
Bay S (2001) Multivariate discretization for set mining. Knowl Inf Syst 3(4):491–512
Article MATH Google Scholar
Biba M, Esposito F, Ferilli S, Di Mauro N, Basile T (2007) Unsupervised discretization using kernel density estimation. In: International joint conference on artificial intelligence, pp 696–701
Google Scholar
Botev Z, Grotowski J, Kroese D (2010) Kernel density estimation via diffusion. Ann Stat 38(5):2916–2957
Article MathSciNet MATH Google Scholar
Boulle M (2004) Khiops: a statistical discretization method of continuous attributes. Mach Learn 55(1):53–69
Article MATH Google Scholar
Cai R, Tung AK, Zhang Z, Hao Z (2011) What is unequal among the equals? Ranking equivalent rules from gene expression data. IEEE Trans Knowl Data Eng 23(11):1735
Article Google Scholar
Clarke E, Barton B (2000) Entropy and mdl discretization of continuous variables for Bayesian belief networks. Int J Intell Syst 15(1):61–92
Article Google Scholar
Cong G, Tan K-L, Tung AKH, Xu X (2005) Mining top-k covering rule groups for gene expression data. In: ACM’s special interest group on management of data (SIGMOD), pp 670–681
Google Scholar
Dougherty J, Kohavi R, Sahami M (1995) Supervised and unsupervised discretization of continuous features. In: International conference on machine learning, pp 194–202
Google Scholar
Fayyad U, Irani K (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: International joint conference on uncertainty in AI, pp 1022–1027
Google Scholar
Flores M, Gámez J, Martínez A, Puerta J (2011) Handling numeric attributes when comparing Bayesian network classifiers: does the discretization method matter? Appl Intell 34:372–385
Article Google Scholar
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180(10):2044–2064
Article Google Scholar
Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):6
Article Google Scholar
Gordon GJ, Jensen RV, Hsiao LL et al (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62(17):4963–4967
Google Scholar
Gupta A, Mehrotra K, Mohan C (2010) A clustering-based discretization for supervised learning. Stat Probab Lett 80(9):816–824
Article MathSciNet MATH Google Scholar
http://nusdm.comp.nus.edu.sg/gemini/geminiii.zip
http://www.khiops.com
Kerber R (1992) Chimerge: discretization of numeric attributes. In: International conference on artificial intelligence, pp 123–128
Google Scholar
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22:79–86
Article MathSciNet MATH Google Scholar
Kurgan L, Cios K (2004) Caim discretization algorithm. IEEE Trans Knowl Data Eng 16(2):145–153
Article Google Scholar
Luengo J, Saez J, Lopez V, Herrera F et al (2012) A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans Knowl Data Eng. doi:10.1109/TKDE.2012.35
Google Scholar
Mehta S, Parthasarathy S, Yang H (2005) Toward unsupervised correlation preserving discretization. IEEE Trans Knowl Data Eng 17(9):1174–1185
Article Google Scholar
Popovic BM, Janev M, Pekar D, Jakovljevic N, Gnjatovic M, Secujski M, Delic V (2012) A novel split-and-merge algorithm for hierarchical clustering of Gaussian mixture models. Appl Intell 37:377–389
Article Google Scholar
Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang C, Angelo M, Ladd C, Reich M, Latulippe E, Mesirov J et al (2001) Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci USA 98(26):15149–15154
Article Google Scholar
Schmidberger G, Frank E (2005) Unsupervised discretization using tree-based density estimation. In: Principles and practice of knowledge discovery in databases (PKDD), pp 240–251
Google Scholar
Shipp M, Ross K, Tamayo P, Weng A, Kutok J, Aguiar R, Gaasenbeek M, Angelo M, Reich M, Pinkus G et al (2002) Diffuse large b-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8(1):68–74
Article Google Scholar
Singh D, Febbo PG, Ross K et al (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209
Article Google Scholar
Singh G, Minz S (2007) Discretization using clustering and rough set theory. In: International conference on computing: theory and applications, March 2007, pp 330–336
Google Scholar
Zighed D, Rabaseda S, Rakotomalala R (1998) Fusinter: a method for discretization of continuous attributes. Int J Uncertain Fuzziness Knowl-Based Syst 6:307–326
Article MATH Google Scholar

Download references

Acknowledgements

This work is financially supported by Natural Science Foundation of China (61070033, 61100148, 61202269), Natural Science Foundation of Guangdong Province (S2011040004804), Key Technology Research and Development Programs of Guangdong Province (2010B050400011), Opening Project of the State Key Laboratory for Novel Software Technology (KFKT2011B19), Foundation for Distinguished Young Talents in Higher Education of Guangdong, China (LYM11060), Science and Technology Plan Project of Guangzhou City (12C42111607, 201200000031), Science and Technology Plan Project of Panyu District Guangzhou (2012-Z-03-67).

Author information

Authors and Affiliations

Faculty of Computer Science, Guangdong University of Technology, Guangzhou, P.R. China
Ruichu Cai, Zhifeng Hao, Wen Wen & Lijuan Wang
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, P.R. China
Ruichu Cai

Authors

Ruichu Cai
View author publications
You can also search for this author in PubMed Google Scholar
Zhifeng Hao
View author publications
You can also search for this author in PubMed Google Scholar
Wen Wen
View author publications
You can also search for this author in PubMed Google Scholar
Lijuan Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruichu Cai.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cai, R., Hao, Z., Wen, W. et al. Regularized Gaussian Mixture Model based discretization for gene expression data association mining. Appl Intell 39, 607–613 (2013). https://doi.org/10.1007/s10489-013-0435-7

Download citation

Published: 02 April 2013
Issue Date: October 2013
DOI: https://doi.org/10.1007/s10489-013-0435-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Regularized Gaussian Mixture Model based discretization for gene expression data association mining

Abstract

Access this article

Similar content being viewed by others

A new clustering method of gene expression data based on multivariate Gaussian mixture models

Graph clustering-based discretization approach to microarray data

Learning Gene Regulatory Networks with High-Dimensional Heterogeneous Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Regularized Gaussian Mixture Model based discretization for gene expression data association mining

Abstract

Access this article

Similar content being viewed by others

A new clustering method of gene expression data based on multivariate Gaussian mixture models

Graph clustering-based discretization approach to microarray data

Learning Gene Regulatory Networks with High-Dimensional Heterogeneous Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation