Abstract
The amount of biological data is increasing and their analysis is becoming one of the most challenging topics in the information sciences. Before starting the analysis it is important to remove unwanted variability due to some factors such as: year of sequencing, laboratory conditions and use of different protocols. This is a crucial step because if the variability is not evaluated before starting the analysis of interest, the results may be undesirable and the conclusion can not be true. The literature suggests to use some valid mathematical models, but experience shows that applying these to high-throughput data with a non-uniform study design is not straightforward and in many cases it may introduce a false signal. Therefore it is necessary to develop models that allow to remove the effects that can negatively influence the study preserving biological meaning. In this paper we report a new case study related lymphoma methylation data and we propose a suitable pipeline for its analysis.
G. Pontali and L. Cascione—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Bioconductor repository provides tools for analysis and comprehension of high-throughput genomic data. It has 1560 software packages. The current release of Bioconductor is version 3.7.
- 2.
Single-stranded fragments of DNA that are complementary to a gene.
- 3.
Pertaining to a chromosome that is not a sex chromosome.
References
https://www.rdocumentation.org/packages/minfi/versions/1.18.4/topics/preprocessQuantile
Aryee, M., et al.: Minfi: a flexible and comprehensive bioconductor package for the analysis of infinium DNA methylation microarrays. Biostatistics 30(10), 1363–1369 (2014)
Benito, M., et al.: Adjustment of systematic microarray data biases. Bioinformatics 20(1), 105–114 (2004)
Bertoni, F., Rossi, D., Zucca, E.: Recent advances in understanding the biology of marginal zone lymphoma. F1000Research 7(406) (2018)
Chen, Y., et al.: Discovery of cross-reactive probes and polymorphic CpGs in the illumina infinium humanmethylation450 microarray. Epigenetics 8(2), 203–209 (2013)
Hicks, S., Okrah, K., Paulson, J., Quackenbush, J., Irizarry, R., Bravo, H.: Smooth quantile normalization. Biostatistics 19(2), 185–198 (2018)
Johnson, W., Cheng, L., Rabinovic, A.: Adjusting batch effects in microarray expression data using empirical bayes methods. Biostatistics 8(1), 118–127 (2007)
Lazar, C., et al.: Batch effect removal methods for microarray gene expression data integration: a survey. Brief. Bioinform. 14(4), 469–490 (2013)
Luo, J., et al.: A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data. Pharmacogenomics 10(4), 278–291 (2010)
Martin-Subero, J., et al.: A comprehensive microarray-based DNA methylation study of 367 hematological neoplasms. PLoS One 4(9), e6986 (2009)
McCarthy, N., et al.: Meta-analysis of human methylation data for evidence of sex-specific autosomal patterns. BMC Genomics 15(1), 981 (2014)
Nueda, M.J., Ferrer, A., Conesa, A.: ARSyN: a method for the identification and removal of systematic noise in multifactorial time course microarray experiments. Biostatistics 13(3), 553–566 (2012)
Nygaard, V., Rødland, E., Hovig, E.: Methods that remove batch effects while retaining group differences may lead to exaggerated confidence in downstream analyses. Biostatistics 17(1), 29–39 (2016)
Pourhoseingholi, M., Baghestani, A., Vahedi, M.: How to control confounding effects by statistical analysis. Gastroenterol Hepatol Bed Bench 5(2), 79–83 (2012)
Rinaldi, A., et al.: Genome-wide dna profiling of marginal zone lymphomas identifies subtype-specific lesions with an impact on the clinical outcome. Blood 117(5), 1595–1604 (2011)
Smyth, G., Speed, T.: Normalization of cDNA microarray data. Methods 31(4), 265–273 (2003)
Spina, V., et al.: The genetics of nodal marginal zone lymphoma. Blood 128(10), 1362–1373 (2016)
Sun, Z., et al.: Batch effect correction for genome-wide methylation data with illumina infinium platform. BMC Med. Genomics 4, 84 (2011)
Swerdlow, S., et al.: WHO classification of tumours of haematopoietic and lymphoid tissues. International Agency for Research on Cancer, Lyon (2008)
Yixue, L., Luonan, C.: Big biological data: challenges and opportunities. Genomics Proteomics Bioinform. 12(5), 187–189 (2014)
Acknowledgments
This work has been partially supported by the following projects: GNCS-INDAM, Fondo Sociale Europeo, and National Research Council Flagship Projects Interomics. This work has been partially supported by the project of the Italian Ministry of education, Universities and Research (MIUR) “Dipartimenti di Eccellenza 2018-2022".
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Pontali, G., Cascione, L., Arribas, A.J., Rinaldi, A., Bertoni, F., Giugno, R. (2019). A Reliable Method to Remove Batch Effects Maintaining Group Differences in Lymphoma Methylation Case Study. In: Rojas, I., Valenzuela, O., Rojas, F., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2019. Lecture Notes in Computer Science(), vol 11466. Springer, Cham. https://doi.org/10.1007/978-3-030-17935-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-17935-9_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-17934-2
Online ISBN: 978-3-030-17935-9
eBook Packages: Computer ScienceComputer Science (R0)