Abstract
Gene expression biclustering analysis is a commonly used technique to see the interaction between genes under certain experiments or conditions. More specifically in the study of diseases, these methods are used to compare control and affected data in order to identify the involved or relevant genes. In some cases, discretization is needed for these algorithms to work correctly. In this context, the choice of the discretization method is extremely important and has a major impact on the outcome. In this work we analyze several discretization methods for Alzheimer Disease (AD) gene expression data and compare the results of a state-of-art biclustering algorithm after each discretization. The comparison reveals that biclusters obtained from discretized expression values achieve a major coverage and overall enrichment than biclusters generated from real-valued expression data. In a particular experiment, a clustering-based discretization method overcomes all competing techniques for the dataset under study, in statistical terms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Friedman, N., Goldszmidt, M.: Discretization of continuous attributes while learning Bayesian networks. In: Saitta, L. (ed.) Proceedings of the 13th International Conference on Machine Learning, pp. 157–165. Morgan Kauffman, San Francisco (1996)
Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discrimination of continuous features. In: Prieditis, A., Russell, S. (eds.) Proceedings of the 12th International Conference on Machine Learning, pp. 194–202. Morgan Kauffman, San Francisco (1995)
Karlebach, G., Shamir, R.: Modeling and analysis of gene regulatory networks. Nat. Rev. Mol. Cell Biol. 9, 770–780 (2008)
Alves, R., Rodriguez-Baena, D.S., Aguilar-Ruiz, J.S.: Gene association analysis: a survey of frequent pattern mining from gene expression data. Brief. Bioinform. 11, 210–224 (2010)
Vignes, M., Vandel, J., Allouche, D., Ramadan-Alban, N., Cierco-Ayrolles, C., et al.: Gene regulatory network reconstruction using bayesian networks, the dantzig selector, the lasso and their meta-analysis. PLoS ONE 6(12), e29165 (2011)
Vijesh, N., Chakrabarti, S.K., Sreekumar, J.: Modeling of gene regulatory networks: a review. J. Biomed. Sci. Eng. 6, 223 (2013)
Gallo, C.A., Carballido, J.A., Ponzoni, I.: Discovering time-lagged rules from microarray data using gene profile classifiers. BMC Bioinformatics 12, 1–21 (2011)
Madeira, S.C., Oliveira, A.L.: An evaluation of discretization methods for non-supervised analysis of time-series gene expression data (2005)
Gallo, C.A., Cecchini, R.L., Carballido, J.A., et al.: Discretization of gene expression data revised. Brief. Bioinform. 17, 758–770 (2015)
Madeira, S., Oliveira, A.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans. Comput. Biol. Bioinform. 1, 24–45 (2004)
Cheng, Y., Church, G.M.: Biclustering of expression data. ISMB 8, 93–103 (2000)
Blalock, E.M., Geddes, J.M., Chen, K.C., et al.: Incipient Alzheimer’s disease: microarray correlation analyses reveal major transcriptional and tumor suppressor responses. In: Proceedings of the National Academy of Sciences, pp. 2173–2178 (2004)
Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Euclidean sum-of-squares clustering. Mach. Learn. 75, 245–248 (2009)
Garcia, S., Luengo, J., Sáez, J.A., et al.: A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans. Knowl. Data Eng. 25, 734–750 (2013)
Dimitrova, E.S., Licona, M.P.V., McGee, J., Laubenbacher, R.: Discretization of time series data. J. Comput. Biol. 17, 853–868 (2010)
Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 3, 185–205 (2005)
Soinov, L.A., Krestyaninova, M.A., Brazma, A.: Towards reconstruction of gene networks from expression data by supervised learning. Genome Biol. 4, 1 (2003)
Ji, L., Tan, K.L.: Mining gene expression data for positive and negative co-regulated gene clusters. Bioinformatics 20, 2711–2718 (2004)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Gallo, C.A., Dussaut, J.S., Carballido, J.A., Ponzoni, I.: BAT: a new biclustering analysis toolbox. In: Ferreira, C.E., Miyano, S., Stadler, P.F. (eds.) BSB 2010. LNCS, vol. 6268, pp. 67–70. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15060-9_8
Gallo, C.A., Carballido, J.A., Ponzoni, I.: BiHEA: a hybrid evolutionary approach for microarray biclustering. In: Guimarães, K.S., Panchenko, A., Przytycka, T.M. (eds.) BSB 2009. LNCS, vol. 5676, pp. 36–47. Springer, Heidelberg (2009). doi:10.1007/978-3-642-03223-3_4
Huang, D.W., Sherman, B.T., Lempicki, R.A.: Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57 (2009)
Huang, D.W., Sherman, B.T., Lempicki, R.A.: Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37, 1–13 (2009)
Acknowledgements
This work was supported by CONICET (Consejo Nacional de Investigaciones Científicas y Técnicas), grant number: PIP 112-2012-0100471, and UNS (Universidad Nacional del Sur), grant number: PGI 24/N042.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Dussaut, J.S., Gallo, C.A., Carballido, J.A., Ponzoni, I. (2017). Analysis of Gene Expression Discretization Techniques in Microarray Biclustering. In: Rojas, I., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2017. Lecture Notes in Computer Science(), vol 10209. Springer, Cham. https://doi.org/10.1007/978-3-319-56154-7_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-56154-7_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56153-0
Online ISBN: 978-3-319-56154-7
eBook Packages: Computer ScienceComputer Science (R0)