Skip to main content
Log in

Empirical study for the agreement between statistical methods in quality assessment and control of microarray data

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

As microarray data quality can affect each step of the microarray analysis process, quality assessment and control is an integral part. It detects divergent measurements beyond the acceptable level of random fluctuations. This empirical study identifies association and correlation between the six quality assessment methods for microarray outlier detection used in the arrayQualityMetrics package version 2.2.2. For evaluation two different agreement tests—Cohen’s Kappa, after a homogeneity marginal criteria, and AC1 Statistic—, the Pearson Correlation Coefficient and realistic microarray data from the public ArrayExpress database have been used. It is possible to assess the quality of a data set using only four of the six currently proposed statistical methods to comprehensively quantify the quality information in large series of microarrays. This saves computation time and reduces decision complexity for the analyst. The new proposed rule is validated with data sets from biomedical studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Altman DG (1991) Practical statistics for medical research. Chapman & Hall, Boca Ration

    Google Scholar 

  • Berrar, DP, Dubitzky, W, Granzow, M (eds) (2003) A practical approach to microarray data analysis. Kluwer Academic Publishers Group, London

    Google Scholar 

  • Brazma A (2009) Minimum information about a microarray experiment (miame)–successes, failures, challenges. Scientific World J 9: 420–423

    Google Scholar 

  • Brettschneider J, Collin F, Bolstad BM, Speed TP (2007) Quality assessment for short oligonucleotide arrays

  • Burgoon LD, Eckel-Passow JE, Gennings C, Boverhof DR, Burt JW, Fong CJ, Zacharewski TR (2005) Protocols for the assurance of microarray data quality and process control. Nucleic Acids Res 33: 1–11

    Article  Google Scholar 

  • Fleiss JL, Levin BA, Levin B, Paik MC (2003) Statistical methods for rates and proportions. Wiley-Interscience, New York

    Book  MATH  Google Scholar 

  • Gautier L, Cope L, Bolstad BM, Irizarry RA (2004) affy—analysis of affymetrix geneChip data at the probe level. Bioinformatics 20(3): 307–315

    Article  Google Scholar 

  • Gentleman R, Carey V, Huber W, Irizarry R, Dudoit S (2005) Bioinformatics and computational biology solutions using R and bioconductor 1st edn. Springer, Berlin

    Book  MATH  Google Scholar 

  • Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JYH, Zhang J (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5(10): R80

    Article  Google Scholar 

  • Gewet K (2002) Handbook of inter-rater reliability. Technical report, STATAXIS Publishing Company

    Google Scholar 

  • Gewet K (2002) Inter-rater reliability: dependency on trait prevalence and marginal homogeneity. Stat Methods Inter Rater Reliab Assess 2: 1–9

    Google Scholar 

  • Huber W (September 2008) Sixth framework programme for quality of life and management of living resources. Technical report, microarray and gene expression data society, EMERALD Workshop

  • Hummel M, Bentink S, Berger H, Klapper W, Wessendorf S, Barth TFE, Bernd H-W, Cogliatti SB, Dierlamm J, Feller AC, Hansmann M-L, Haralambieva E, Harder L, Hasenclever D, Kḧn M, Lenze D, Lichter P, Martin-Subero JI, Möller P, Müller-Hermelink H-K, Ott G, Parwaresch RM, Pott C, Rosenwald A, Rosolowski M, Schwaenen C, Stürzenhofecker B, Szczepanowski M, Trautmann H, Wacker H-H, Spang R, Loeffler M, Trümper L, Stein H, Siebert R (2006) Molecular mechanisms in malignant Lymphomas network project of the Deutsche Krebshilfe. A biologic definition of burkitt’s lymphoma from transcriptional and genomic profiling. N Engl J Med 354(23): 2419–2430

    Article  Google Scholar 

  • Kauffmann A, Gentleman R, Huber W (2009) arrayQualityMetrics—a bioconductor package for quality assessment of microarray data. Bioinformatics 25(3): 415–416

    Article  Google Scholar 

  • Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1): 159–174

    Article  MathSciNet  MATH  Google Scholar 

  • McNemar Q (1947) Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12: 153–157

    Article  Google Scholar 

  • Parkinson H, Kapushesky M, Kolesnikov N, Rustici G, Shojatalab M, Abeygunawardena N, Berube H, Dylag M, Emam I, Farne A, Holloway E, Lukk M, Malone J, Mani R, Pilicheva E, Rayner TF, Rezwan F, Sharma A, Williams E, Bradley XZ, Adamusiak T, Brandizi M, Burdett T, Coulson R, Krestyaninova M, Kurnosov P, Maguire E, Neogi SG, Rocca-Serra P, Sansone S-A, Sklyar N, Zhao M, Sarkans U, Brazma A (2009) Arrayexpress update—from an archive of functional genomics experiments to the atlas of gene expression. Nucleic Acids Res 37(Database issue): D868–D872

    Article  Google Scholar 

  • Parkinson H, Kapushesky M, Shojatalab M, Abeygunawardena N, Coulson R, Farne A, Holloway E, Kolesnykov N, Lilja P, Lukk M, Mani R, Rayner T, Sharma A, William E, Sarkans U, Brazma A (2007) Arrayexpress—a public database of microarray experiments and gene expression profiles. Nucleic Acids Res 35(Database issue): D747–D750

    Article  Google Scholar 

  • Schmidt M, Böhm D, von Törne C, Steiner E, Puhl A, Pilch H, Lehr H-A, Hengstler JG, Kölbl H, Gehrmann M (2008) The humoral immune system has a key prognostic impact in node-negative breast cancer. Cancer Res 68(13): 5405–5413

    Article  Google Scholar 

  • Schmidberger M, Mansmann U (2008) Parallelized preprocessing algorithms for high-density oligonucleotide arrays. In: Proceedings IEEE international symposium on parallel and distributed processing IPDPS, 14–18 April 2008, pp 1–7

  • Schmidberger M, Vicedo E, Mansmann U (2009) affypara—a bioconductor package for parallelized preprocessing algorithms of affymetrix microarray data. Bioinform Biol Insights 3: 83–87

    Google Scholar 

  • Sotiriou C, Wirapati P, Loi S, Harris A, Fox S, Smeds J, Nordgren H, Farmer P, Praz V, Haibe-Kains B, Desmedt C, Larsimont D, Cardoso F, Peterse H, Nuyten D, Marc B, Van de Vijver MJ, Bergh J, Piccart M, Delorenzi M (2006) Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J Natl Cancer Inst 98(4): 262–272

    Article  Google Scholar 

  • Stevens W Richard (1992) Advanced programming in the UNIX environment. Addison-Wesley, Upper Saddle River, NJ [u.a.]

  • Stirewalt DL, Meshinchi S, Kopecky KJ, Fan W, Pogosova-Agadjanyan EL, Engel JH, Cronk MR, Dorcy KS, McQuary AR, Hockenbery D, Wood B, Heimfeld S, Radich JP (2008) Identification of genes with abnormal expression changes in Acute Myeloid Leukemia. Genes Chromosomes Cancer 47(1): 8–20

    Article  Google Scholar 

  • Urbanek S (2009) multicore: parallel processing of R code on machines with multiple cores or CPUs, R package version 0.1–3

  • Vicedo E (2009) Quality assessment of huge numbers of affymetrix microarray data

  • Wang Q, Diskin S, Rappaport E, Attiyeh E, Mosse Y, Shue D, Seiser E, Jagannathan J, Shusterman S, Bansal M, Khazi D, Winter C, Okawa E, Grant G, Cnaan A, Zhao H, Cheung N-K, Gerald W, London W, Matthay KK, Brodeur GM, Maris JM (2006) Integrative genomics identifies distinct molecular classes of neuroblastoma and shows that multiple genes are targeted by regional alterations in dna copy number. Cancer Res 66(12): 6050–6062

    Article  Google Scholar 

  • Wilson CL, Miller CJ (2005) Simpleaffy: a bioconductor package for affymetrix quality control and data analysis. Bioinformatics 21(18): 3683–3685

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Markus Schmidberger.

Electronic Supplementary Material

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schmidberger, M., Vicedo, E. & Mansmann, U. Empirical study for the agreement between statistical methods in quality assessment and control of microarray data. Comput Stat 26, 259–277 (2011). https://doi.org/10.1007/s00180-010-0216-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-010-0216-2

Keywords

Navigation