Abstract
Despite the increasing amount of available gene expression data, integrative analysis is still hindered by its high susceptibility to microenvironment fluctuations, resulting in inter-experiment variability known as batch effects. Therefore the development of data integration strategy is now more necessary than ever. Although several normalization algorithms have already been proposed, we believe that an effective model must rely on data migration between schemes. In this paper we apply this approach to a set of microarray data from core needle biopsy of breast cancers spanning different microarray platforms, and demonstrate its effectiveness in data preparation for unsupervised analysis and multiclass classification tasks. We propose a custom tool dedicated to defining the model structure. Additionally, we compare several pipelines of data processing, combining data normalization with different batch effect correction methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lander, E.S.: Array of hope. Nat. Genet. 21, 3–4 (1999)
Fare, T.L., et al.: Effects of atmospheric ozone on microarray data quality. Anal. Chem. 75(17), 4672–4675 (2003)
Luo, J., et al.: A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data. Pharmacogenomics J. 10, 278–291 (2010)
Leek, J.T., et al.: Tackling the widespread and critical impact of batch effects in high-throughput data. Nat. Rev. Genet. 11, 733–739 (2010)
Simek, K., et al.: Using SVD and SVM methods for selection, classification, clustering and modeling of DNA microarray data. Eng. Appl. Artif. Intell. 17(4), 417–427 (2004)
Lisowska, K., et al.: Unsupervised analysis reveals two molecular subgroups of serous ovarian cancer with distinct gene expression profiles and survival. J. Cancer Res. Clin. Oncol. 142, 1239–1252 (2016)
Fujarewicz, K., Kimmel, M., Rzeszowska-Wolny, J., Swierniak, A.: A note on classification of gene expression data using support vector machines. J. Biol. Syst. 11(1), 43–56 (2003)
Chen, C., et al.: Removing batch effects in analysis of expression microarray data: an evaluation of six batch adjustment methods. PLoS One 6(2), e17238 (2011)
Perou, C.M., et al.: Adjustment of systematic microarray data biases. Bioinformatics 20(1), 105–114 (2004)
Coletta, A., et al.: Batch effect removal methods for microarray gene expression data integration: a survey. Brief. Bioinform. 14(4), 469–490 (2012)
Alter, O., Brown, P.O., Botstein, D.: Singular value decomposition for genome-wide expression data processing and modeling. Proc. Nat. Acad. Sci. 97(18), 10101–10106 (2000)
Leek, J.T., Storey, J.D.: Capturing heterogeneity in gene expression studies by surrogate variable analysis. PloS Genet. 3(9), 1724–1735 (2007)
Sims, A.H., et al.: The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets–improving meta-analysis and prediction of prognosis. BMC Med. Genomics 1, 42 (2008)
Sandberg, R., Larsson, O.: Improved precision and accuracy for microarrays using updated probe set definitions. BMC Bioinform. 8, 48 (2007)
McCall, M.N., Bolstad, B.M., Irizarry, R.A.: Frozen robust multiarray analysis (fRMA). Biostatistics (Oxford, England) 11(2), 242–253 (2010)
Li, C., Johnson, W.E., Rabinovic, A.: Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8(1), 118–127 (2006)
Müller, C., et al.: Removing batch effects from longitudinal gene expression - quantile normalization plus combat as best approach for microarray transcriptome data. PLoS One 11, e0156594 (2016)
Cai, H., et al.: Identifying differentially expressed genes from cross-site integrated data based on relative expression orderings. Int. J. Biol. Sci. 14, 892–900 (2018)
Student, S., Fujarewicz, K.: Stable feature selection and classification algorithms for multiclass microarray data. Biol. Direct 7(33), 1–20 (2012)
Fujarewicz, K., et al.: Large-scale data classification system based on galaxy server and protected from information leak. In: Nguyen, N.T., Tojo, S., Nguyen, L.M., Trawiński, B. (eds.) ACIIDS 2017. LNCS (LNAI), vol. 10192, pp. 765–773. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54430-4_73
Pojda, K., Jakubczak, M., Student, S., Świerniak, A., Fujarewicz, K.: Comparing different data fusion strategies for cancer classification. In: Rocha, Á., Guarda, T. (eds.) ICITS 2018. AISC, vol. 721, pp. 417–426. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73450-7_40
Acknowledgement
This work was supported by Polish National Centre for Research and Development under Grant Strategmed2/267398/4/NCBR/2015 and Silesian University of Technology Grant 02/010/BK_18/0102 and by Polish Ministry of Science and Higher Education as part of the Implementation Doctorate program at the Silesian University of Technology, Gliwice, Poland (contract No 10/DW/2017/01/1) (AP). Data analysis was partially carried out using the Biotest Platform developed within Project n. PBS3/B3/32/2015 financed by the Polish National Centre of Research and Development (NCBiR). Calculations were performed using the infrastructure supported by the computer cluster Ziemowit (www.ziemowit.hpc.polsl.pl) funded by the Silesian BIO-FARMA project No. POIG.02.01.00-00-166/08 and expanded in the POIG.02.03.01-00- 040/13 in the Computational Biology and Bioinformatics Laboratory of the Biotechnology Centre at the Silesian University of Technology. This work was partially supported by the Polish Ministry of Science and Higher Education as part of the Implementation Doctorate program at the Silesian University of Technology, Gliwice, Poland (contract No 10/DW/2017/01/1).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Student, S., Płuciennik, A., Łakomiec, K., Wilk, A., Bensz, W., Fujarewicz, K. (2019). Integration Strategies of Cross-Platform Microarray Data Sets in Multiclass Classification Problem. In: Misra, S., et al. Computational Science and Its Applications – ICCSA 2019. ICCSA 2019. Lecture Notes in Computer Science(), vol 11623. Springer, Cham. https://doi.org/10.1007/978-3-030-24308-1_48
Download citation
DOI: https://doi.org/10.1007/978-3-030-24308-1_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24307-4
Online ISBN: 978-3-030-24308-1
eBook Packages: Computer ScienceComputer Science (R0)