Skip to main content
Log in

Visualisation of “High p, Small n” data

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Development of methods for visualisation of high-dimensional data where the number of observations, n, is small compared to the number of variables, p, is of increasing importance. One major application is the burgeoning field of microarray (gene expression) experiments. Because of their high cost, the number of chips (n) is O(10 − 102) while the number (p) of genes (including expressed sequence tags) on each chip is O(103 − 104). Based on synthetic data simulated in accord with current biological interpretation of microarray data, we have adapted the biplot that simultaneously plots the genes and the chips to display relevant experimental information. Other ordination techniques are also useful for visually exploring microarray data. The biological information that can be revealed by applying these exploratory, visual techniques is illustrated using data from gene expression experiments. When ordination methods, or dimension reduction methods such as PCA and its many variants, are used, in association with gene selection methods, it is well known that “selection bias” can result. We show an application of bootstrap methodology to ordination methods that can be used to account for this bias. Such methods are invaluable when visualization methods are used for pattern recognition, such as when identifying previously unknown sub-classes of tumours in molecular classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Dudoit S, Fridlyand J, Speed TP (2002) Comparison of discrimination methods for the classification of tumours using gene expression data. J Am Stat Assoc 97:22–87

    Article  MathSciNet  Google Scholar 

  • Ge Y, Dudoit S, Speed TP (2003) Resampling-based multiple testing for microarray data analysis (with discussion). TEST 12:1–78

    Article  MathSciNet  Google Scholar 

  • Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537

    Article  Google Scholar 

  • Pittelkow YE (2004) Analysis of high-density oligonucleotide microarray data: a statistical perspective. Thesis submitted towards the degree of Doctor of Philosophy, ANU

  • Pittelkow YE, Wilson SR (2003) Visualisation of gene expression data—the GE-biplot, the Chip-plot and the Gene-plot. Stat Appl Genet Mol Biol 2, 1, 6:1–19

    MathSciNet  Google Scholar 

  • Pittelkow YE, Wilson SR (2005) Use of principal component analysis and of the GE-biplot for the graphical exploration of gene expression data. Biometrics 61:630–632

    Article  MathSciNet  Google Scholar 

  • Wouters L, Göhlmann HW, Bijnens L, Kass SU, Molenberghs G, Lewi PJ (2003) Graphical exploration of gene expression data: a comparative study of three multivariate methods. Biometrics 59:1131–1139

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. R. Wilson.

Additional information

A colour version of the paper is available at: DOI:10.1007/s00180-007-0060-1. The sample numbers shown on the plots can also be used for identifying the different classes if a colour version is not available. The sample numbers for the ALL B-cells are 1, 4, 5, 7, 8, 12, 13, 15, 16, 17, 18, 19, 20, 21, 22, 24, 25, 26, and 27 respectively. Those for the ALL T-Cells are 2, 3, 6, 9, 10, 11, 14 and 23, and for the AML the samples are 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pittelkow, Y.E., Wilson, S.R. Visualisation of “High p, Small n” data. Computational Statistics 22, 533–541 (2007). https://doi.org/10.1007/s00180-007-0060-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-007-0060-1

Keywords

Navigation