Abstract
Merging gene expression datasets is a simple way to increase the number of samples in an analysis. However experimental and data processing conditions, which are proper to each dataset, generally influence the expression values and can hide the biological effect of interest. It is then important to normalize the bigger merged dataset regarding those batch effects, as failing to adjust for them may adversely impact statistical inference. In this context, we propose to use a “spatiotemporal” independent component analysis to model the influence of those unwanted effects and remove them from the data. We show on a real dataset that our method allows to improve this modeling and helps to improve sample classification tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alter, O., Brown, P.O., Botstein, D.: Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl. Acad. Sci. U.S.A. 97(18), 10101–10106 (2000)
Chen, C., Grennan, K., Badner, J., Zhang, D., Gershon, E., Jin, L., Liu, C.: Removing batch effects in analysis of expression microarray data: an evaluation of six batch adjustment methods. PloS ONE 6(2), e17238 (2011)
Cardoso, J.-F.: High-order contrasts for independent component analysis. Neural Comput. 11(1), 157–192 (1999)
Desmedt, C., et al.: Strong time dependence of the 76-gene prognostic signature for node-negative breast cancer patients in the TRANSBIG multicenter independent validation series. Clin. Cancer Res. 13(11), 3207–3214 (2007)
Johnson, W., Li, C., Rabinovic, A.: Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8(1), 118–127 (2007)
Lazar, C., Meganck, S., Taminau, J., Steenhoff, D., Coletta, A., Molter, C., Weiss-Solís, D.Y., Duque, R., Bersini, H., Nowé, A.: Batch effect removal methods for microarray gene expression data integration: a survey. Brief. Bioinform. 14(4), 469–490 (2013)
Leek, J.T., et al.: Tackling the widespread and critical impact of batch effects in high-throughput data. Nat. Rev. Genet. 11(10), 733–739 (2010)
Leek, J.T., Storey, J.D.: Capturing heterogeneity in gene expression studies by surrogate variable analysis. PloS Genet. 3(9), e161 (2007)
Loi, S., et al.: Definition of clinically distinct molecular subtypes in estrogen receptor-positive breast carcinomas through genomic grade. J. Clin. Oncol. 25(10), 1239–1246 (2007)
Miller, L.D., et al.: An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proc. Natl. Acad. Sci. U.S.A. 102(38), 13550–13555 (2005)
Minn, A.J., et al.: Lung metastasis genes couple breast tumor size and metastatic spread. Proc. Natl. Acad. Sci. 104(16), 6740–6745 (2007)
Renard, E., Teschendorff, A.E., Absil, P.-A.: Capturing confounding sources of variation in DNA methylation data by spatiotemporal independent component analysis. In: 22nd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (2014)
Sabatier, R., Finetti, P., Cervera, N., Lambaudie, E., Esterni, B., Mamessier, E., Tallet, A., Chabannon, C., Extra, J.-M., Jacquemier, J., Viens, P., Birnbaum, D., Bertucci, F.: A gene expression signature identifies two prognostic subgroups of basal breast cancer. Breast Cancer Res. Treat. 126(2), 407–420 (2011)
Sainlez, M., Absil, P.-A., Teschendorff, A.E.: Gene expression data analysis using spatiotemporal blind source separation. In: 17nd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (2009)
Sotiriou, C., et al.: Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J. Nat. Cancer Inst. 98(4), 262–272 (2006)
Stone, J.V., Porrill, J., Porter, N.R., Wilkinson, I.D.: Spatiotemporal independent component analysis of event-related fMRI data using skewed probability density functions. NeuroImage 15(2), 407–421 (2002)
Teschendorff, A.E., Zhuang, J., Widschwendter, M.: Independent surrogate variable analysis to deconvolve confounding factors in large-scale microarray profiling studies. Bioinformatics 27(11), 1496–1505 (2011)
Wang, Y., et al.: Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365(9460), 671–679 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Renard, E., Branders, S., Absil, PA. (2016). Independent Component Analysis to Remove Batch Effects from Merged Microarray Datasets. In: Frith, M., Storm Pedersen, C. (eds) Algorithms in Bioinformatics. WABI 2016. Lecture Notes in Computer Science(), vol 9838. Springer, Cham. https://doi.org/10.1007/978-3-319-43681-4_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-43681-4_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43680-7
Online ISBN: 978-3-319-43681-4
eBook Packages: Computer ScienceComputer Science (R0)