Skip to main content

Integration Strategies of Cross-Platform Microarray Data Sets in Multiclass Classification Problem

  • Conference paper
  • First Online:
Computational Science and Its Applications – ICCSA 2019 (ICCSA 2019)

Abstract

Despite the increasing amount of available gene expression data, integrative analysis is still hindered by its high susceptibility to microenvironment fluctuations, resulting in inter-experiment variability known as batch effects. Therefore the development of data integration strategy is now more necessary than ever. Although several normalization algorithms have already been proposed, we believe that an effective model must rely on data migration between schemes. In this paper we apply this approach to a set of microarray data from core needle biopsy of breast cancers spanning different microarray platforms, and demonstrate its effectiveness in data preparation for unsupervised analysis and multiclass classification tasks. We propose a custom tool dedicated to defining the model structure. Additionally, we compare several pipelines of data processing, combining data normalization with different batch effect correction methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lander, E.S.: Array of hope. Nat. Genet. 21, 3–4 (1999)

    Article  Google Scholar 

  2. Fare, T.L., et al.: Effects of atmospheric ozone on microarray data quality. Anal. Chem. 75(17), 4672–4675 (2003)

    Article  Google Scholar 

  3. Luo, J., et al.: A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data. Pharmacogenomics J. 10, 278–291 (2010)

    Article  Google Scholar 

  4. Leek, J.T., et al.: Tackling the widespread and critical impact of batch effects in high-throughput data. Nat. Rev. Genet. 11, 733–739 (2010)

    Article  Google Scholar 

  5. Simek, K., et al.: Using SVD and SVM methods for selection, classification, clustering and modeling of DNA microarray data. Eng. Appl. Artif. Intell. 17(4), 417–427 (2004)

    Article  Google Scholar 

  6. Lisowska, K., et al.: Unsupervised analysis reveals two molecular subgroups of serous ovarian cancer with distinct gene expression profiles and survival. J. Cancer Res. Clin. Oncol. 142, 1239–1252 (2016)

    Article  Google Scholar 

  7. Fujarewicz, K., Kimmel, M., Rzeszowska-Wolny, J., Swierniak, A.: A note on classification of gene expression data using support vector machines. J. Biol. Syst. 11(1), 43–56 (2003)

    Article  Google Scholar 

  8. Chen, C., et al.: Removing batch effects in analysis of expression microarray data: an evaluation of six batch adjustment methods. PLoS One 6(2), e17238 (2011)

    Article  MathSciNet  Google Scholar 

  9. Perou, C.M., et al.: Adjustment of systematic microarray data biases. Bioinformatics 20(1), 105–114 (2004)

    Article  Google Scholar 

  10. Coletta, A., et al.: Batch effect removal methods for microarray gene expression data integration: a survey. Brief. Bioinform. 14(4), 469–490 (2012)

    Google Scholar 

  11. Alter, O., Brown, P.O., Botstein, D.: Singular value decomposition for genome-wide expression data processing and modeling. Proc. Nat. Acad. Sci. 97(18), 10101–10106 (2000)

    Article  Google Scholar 

  12. Leek, J.T., Storey, J.D.: Capturing heterogeneity in gene expression studies by surrogate variable analysis. PloS Genet. 3(9), 1724–1735 (2007)

    Article  Google Scholar 

  13. Sims, A.H., et al.: The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets–improving meta-analysis and prediction of prognosis. BMC Med. Genomics 1, 42 (2008)

    Article  Google Scholar 

  14. Sandberg, R., Larsson, O.: Improved precision and accuracy for microarrays using updated probe set definitions. BMC Bioinform. 8, 48 (2007)

    Article  Google Scholar 

  15. McCall, M.N., Bolstad, B.M., Irizarry, R.A.: Frozen robust multiarray analysis (fRMA). Biostatistics (Oxford, England) 11(2), 242–253 (2010)

    Article  Google Scholar 

  16. Li, C., Johnson, W.E., Rabinovic, A.: Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8(1), 118–127 (2006)

    MATH  Google Scholar 

  17. Müller, C., et al.: Removing batch effects from longitudinal gene expression - quantile normalization plus combat as best approach for microarray transcriptome data. PLoS One 11, e0156594 (2016)

    Article  Google Scholar 

  18. Cai, H., et al.: Identifying differentially expressed genes from cross-site integrated data based on relative expression orderings. Int. J. Biol. Sci. 14, 892–900 (2018)

    Article  Google Scholar 

  19. Student, S., Fujarewicz, K.: Stable feature selection and classification algorithms for multiclass microarray data. Biol. Direct 7(33), 1–20 (2012)

    Google Scholar 

  20. Fujarewicz, K., et al.: Large-scale data classification system based on galaxy server and protected from information leak. In: Nguyen, N.T., Tojo, S., Nguyen, L.M., Trawiński, B. (eds.) ACIIDS 2017. LNCS (LNAI), vol. 10192, pp. 765–773. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54430-4_73

    Chapter  Google Scholar 

  21. Pojda, K., Jakubczak, M., Student, S., Świerniak, A., Fujarewicz, K.: Comparing different data fusion strategies for cancer classification. In: Rocha, Á., Guarda, T. (eds.) ICITS 2018. AISC, vol. 721, pp. 417–426. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73450-7_40

    Chapter  Google Scholar 

Download references

Acknowledgement

This work was supported by Polish National Centre for Research and Development under Grant Strategmed2/267398/4/NCBR/2015 and Silesian University of Technology Grant 02/010/BK_18/0102 and by Polish Ministry of Science and Higher Education as part of the Implementation Doctorate program at the Silesian University of Technology, Gliwice, Poland (contract No 10/DW/2017/01/1) (AP). Data analysis was partially carried out using the Biotest Platform developed within Project n. PBS3/B3/32/2015 financed by the Polish National Centre of Research and Development (NCBiR). Calculations were performed using the infrastructure supported by the computer cluster Ziemowit (www.ziemowit.hpc.polsl.pl) funded by the Silesian BIO-FARMA project No. POIG.02.01.00-00-166/08 and expanded in the POIG.02.03.01-00- 040/13 in the Computational Biology and Bioinformatics Laboratory of the Biotechnology Centre at the Silesian University of Technology. This work was partially supported by the Polish Ministry of Science and Higher Education as part of the Implementation Doctorate program at the Silesian University of Technology, Gliwice, Poland (contract No 10/DW/2017/01/1).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sebastian Student .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Student, S., Płuciennik, A., Łakomiec, K., Wilk, A., Bensz, W., Fujarewicz, K. (2019). Integration Strategies of Cross-Platform Microarray Data Sets in Multiclass Classification Problem. In: Misra, S., et al. Computational Science and Its Applications – ICCSA 2019. ICCSA 2019. Lecture Notes in Computer Science(), vol 11623. Springer, Cham. https://doi.org/10.1007/978-3-030-24308-1_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-24308-1_48

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-24307-4

  • Online ISBN: 978-3-030-24308-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics