Abstract
As microarray data quality can affect each step of the microarray analysis process, quality assessment and control is an integral part. It detects divergent measurements beyond the acceptable level of random fluctuations. This empirical study identifies association and correlation between the six quality assessment methods for microarray outlier detection used in the arrayQualityMetrics package version 2.2.2. For evaluation two different agreement tests—Cohen’s Kappa, after a homogeneity marginal criteria, and AC1 Statistic—, the Pearson Correlation Coefficient and realistic microarray data from the public ArrayExpress database have been used. It is possible to assess the quality of a data set using only four of the six currently proposed statistical methods to comprehensively quantify the quality information in large series of microarrays. This saves computation time and reduces decision complexity for the analyst. The new proposed rule is validated with data sets from biomedical studies.
Similar content being viewed by others
References
Altman DG (1991) Practical statistics for medical research. Chapman & Hall, Boca Ration
Berrar, DP, Dubitzky, W, Granzow, M (eds) (2003) A practical approach to microarray data analysis. Kluwer Academic Publishers Group, London
Brazma A (2009) Minimum information about a microarray experiment (miame)–successes, failures, challenges. Scientific World J 9: 420–423
Brettschneider J, Collin F, Bolstad BM, Speed TP (2007) Quality assessment for short oligonucleotide arrays
Burgoon LD, Eckel-Passow JE, Gennings C, Boverhof DR, Burt JW, Fong CJ, Zacharewski TR (2005) Protocols for the assurance of microarray data quality and process control. Nucleic Acids Res 33: 1–11
Fleiss JL, Levin BA, Levin B, Paik MC (2003) Statistical methods for rates and proportions. Wiley-Interscience, New York
Gautier L, Cope L, Bolstad BM, Irizarry RA (2004) affy—analysis of affymetrix geneChip data at the probe level. Bioinformatics 20(3): 307–315
Gentleman R, Carey V, Huber W, Irizarry R, Dudoit S (2005) Bioinformatics and computational biology solutions using R and bioconductor 1st edn. Springer, Berlin
Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JYH, Zhang J (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5(10): R80
Gewet K (2002) Handbook of inter-rater reliability. Technical report, STATAXIS Publishing Company
Gewet K (2002) Inter-rater reliability: dependency on trait prevalence and marginal homogeneity. Stat Methods Inter Rater Reliab Assess 2: 1–9
Huber W (September 2008) Sixth framework programme for quality of life and management of living resources. Technical report, microarray and gene expression data society, EMERALD Workshop
Hummel M, Bentink S, Berger H, Klapper W, Wessendorf S, Barth TFE, Bernd H-W, Cogliatti SB, Dierlamm J, Feller AC, Hansmann M-L, Haralambieva E, Harder L, Hasenclever D, Kḧn M, Lenze D, Lichter P, Martin-Subero JI, Möller P, Müller-Hermelink H-K, Ott G, Parwaresch RM, Pott C, Rosenwald A, Rosolowski M, Schwaenen C, Stürzenhofecker B, Szczepanowski M, Trautmann H, Wacker H-H, Spang R, Loeffler M, Trümper L, Stein H, Siebert R (2006) Molecular mechanisms in malignant Lymphomas network project of the Deutsche Krebshilfe. A biologic definition of burkitt’s lymphoma from transcriptional and genomic profiling. N Engl J Med 354(23): 2419–2430
Kauffmann A, Gentleman R, Huber W (2009) arrayQualityMetrics—a bioconductor package for quality assessment of microarray data. Bioinformatics 25(3): 415–416
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1): 159–174
McNemar Q (1947) Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12: 153–157
Parkinson H, Kapushesky M, Kolesnikov N, Rustici G, Shojatalab M, Abeygunawardena N, Berube H, Dylag M, Emam I, Farne A, Holloway E, Lukk M, Malone J, Mani R, Pilicheva E, Rayner TF, Rezwan F, Sharma A, Williams E, Bradley XZ, Adamusiak T, Brandizi M, Burdett T, Coulson R, Krestyaninova M, Kurnosov P, Maguire E, Neogi SG, Rocca-Serra P, Sansone S-A, Sklyar N, Zhao M, Sarkans U, Brazma A (2009) Arrayexpress update—from an archive of functional genomics experiments to the atlas of gene expression. Nucleic Acids Res 37(Database issue): D868–D872
Parkinson H, Kapushesky M, Shojatalab M, Abeygunawardena N, Coulson R, Farne A, Holloway E, Kolesnykov N, Lilja P, Lukk M, Mani R, Rayner T, Sharma A, William E, Sarkans U, Brazma A (2007) Arrayexpress—a public database of microarray experiments and gene expression profiles. Nucleic Acids Res 35(Database issue): D747–D750
Schmidt M, Böhm D, von Törne C, Steiner E, Puhl A, Pilch H, Lehr H-A, Hengstler JG, Kölbl H, Gehrmann M (2008) The humoral immune system has a key prognostic impact in node-negative breast cancer. Cancer Res 68(13): 5405–5413
Schmidberger M, Mansmann U (2008) Parallelized preprocessing algorithms for high-density oligonucleotide arrays. In: Proceedings IEEE international symposium on parallel and distributed processing IPDPS, 14–18 April 2008, pp 1–7
Schmidberger M, Vicedo E, Mansmann U (2009) affypara—a bioconductor package for parallelized preprocessing algorithms of affymetrix microarray data. Bioinform Biol Insights 3: 83–87
Sotiriou C, Wirapati P, Loi S, Harris A, Fox S, Smeds J, Nordgren H, Farmer P, Praz V, Haibe-Kains B, Desmedt C, Larsimont D, Cardoso F, Peterse H, Nuyten D, Marc B, Van de Vijver MJ, Bergh J, Piccart M, Delorenzi M (2006) Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J Natl Cancer Inst 98(4): 262–272
Stevens W Richard (1992) Advanced programming in the UNIX environment. Addison-Wesley, Upper Saddle River, NJ [u.a.]
Stirewalt DL, Meshinchi S, Kopecky KJ, Fan W, Pogosova-Agadjanyan EL, Engel JH, Cronk MR, Dorcy KS, McQuary AR, Hockenbery D, Wood B, Heimfeld S, Radich JP (2008) Identification of genes with abnormal expression changes in Acute Myeloid Leukemia. Genes Chromosomes Cancer 47(1): 8–20
Urbanek S (2009) multicore: parallel processing of R code on machines with multiple cores or CPUs, R package version 0.1–3
Vicedo E (2009) Quality assessment of huge numbers of affymetrix microarray data
Wang Q, Diskin S, Rappaport E, Attiyeh E, Mosse Y, Shue D, Seiser E, Jagannathan J, Shusterman S, Bansal M, Khazi D, Winter C, Okawa E, Grant G, Cnaan A, Zhao H, Cheung N-K, Gerald W, London W, Matthay KK, Brodeur GM, Maris JM (2006) Integrative genomics identifies distinct molecular classes of neuroblastoma and shows that multiple genes are targeted by regional alterations in dna copy number. Cancer Res 66(12): 6050–6062
Wilson CL, Miller CJ (2005) Simpleaffy: a bioconductor package for affymetrix quality control and data analysis. Bioinformatics 21(18): 3683–3685
Author information
Authors and Affiliations
Corresponding author
Electronic Supplementary Material
The Below are the Electronic Supplementary Material.
Rights and permissions
About this article
Cite this article
Schmidberger, M., Vicedo, E. & Mansmann, U. Empirical study for the agreement between statistical methods in quality assessment and control of microarray data. Comput Stat 26, 259–277 (2011). https://doi.org/10.1007/s00180-010-0216-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-010-0216-2