Skip to main content

HAUCA Curves for the Evaluation of Biomarker Pilot Studies with Small Sample Sizes and Large Numbers of Features

  • Conference paper
  • First Online:
Advances in Intelligent Data Analysis XV (IDA 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9897))

Included in the following conference series:

  • 1758 Accesses

Abstract

Biomarker studies often try to identify a combination of measured attributes to support the diagnosis of a specific disease. Measured values are commonly gained from high-throughput technologies like next generation sequencing leading to an abundance of biomarker candidates compared to the often very small sample size. Here we use an example with more than 50,000 biomarker candidates that we want to evaluate based on a sample of only 24 patients. This seems to be an impossible task and finding purely random-based correlations is guaranteed. Although we cannot identify specific biomarkers in such small pilot studies with purely statistical methods, one can still derive whether there are more biomarkers showing a high correlation with the disease under consideration than one would expect in a setting where correlations are purely random. We propose a method based on area under the ROC curve (AUC) values that indicates how much correlations of the biomarkers with the disease of interest exceed pure random effects. We also provide estimations of sample sizes for follow-up studies to actually identify concrete biomarkers and build classifiers for the disease. We also describe how our method can be extended to other performance measures than AUC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    This is a more or less realistic assumption for microarray and next generation sequencing data but not for data from mass spectrometry.

  2. 2.

    The data set is currently submitted to a medical journal.

  3. 3.

    The HAUCA curves were neither available nor discussed in the paper [10].

References

  1. De Angelis, G., Rittenhouse, H., Mikolajczyk, S., Blair, S., Semjonow, A.: Twenty years of PSA: from prostate antigen to tumor marker. Rev. Urol. 9(3), 113–123 (2007)

    Google Scholar 

  2. Lichtinghagen, R., Pietsch, D., Bantel, H., Manns, M., Brand, K., Bahr, M.: The enhanced liver fibrosis (ELF) score: normal values, influence factors and proposed cut-off values. J. Hepatol. 59(2), 236–242 (2013)

    Article  Google Scholar 

  3. Ambroise, C., McLachlan, G.J.: Selection bias in gene extraction on the basis of microarray gene-expression data. Proc. Natl. Acad. Sci. 99(10), 6562–6566 (2002)

    Article  MATH  Google Scholar 

  4. Varma, S., Simon, R.: Bias in error estimation when using cross-validation for model selection. BMC Bioinform. 7(91), 1 (2006). doi:10.1186/1471-2105-7-91

    Google Scholar 

  5. Omar, M., Klawonn, F., Brand, S., Stiesch, M., Krettek, C., Eberhard, J.: Transcriptome-wide high-density microarray analysis reveals differential gene transcription in periprosthetic tissue from hips with low-grade infection versus aseptic loosening. J. Arthroplasty (2016, to appear). doi:10.1016/j.arth.2016.06.036

    Google Scholar 

  6. Hand, D.: Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach. Learn. 77, 103–123 (2009)

    Article  Google Scholar 

  7. Flach, P., Hernández-Orallo, J., Ferri, C.: A coherent interpretation of AUC as a measure of aggregated classification performance. In: Proceedings of the 28th International Conference on Machine Learning (ICML 2011), pp. 657–664 (2011)

    Google Scholar 

  8. Mason, S.J., Graham, N.E.: Areas beneath the relative operating characteristics (ROC) and relative operating levels (ROL) curves: Statistical significance and interpretation. Q. J. Royal Meteorol. Soc. 128(584), 2145–2166 (2002)

    Article  Google Scholar 

  9. Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6, 65–70 (1979)

    MathSciNet  MATH  Google Scholar 

  10. Szafranski, S., Wos-Oxley, M., Vilchez-Vargas, R., Jáuregui, R., Plumeier, I., Klawonn, F., Tomasch, J., Meisinger, C., Kühnisch, J., Sztajer, H., Pieper, D., Wagner-Döbler, I.: High-resolution taxonomic profiling of the subgingival microbiome for biomarker discovery and periodontitis diagnosis. Appl. Environ. Microbiol. 81, 1047–1058 (2015)

    Article  Google Scholar 

  11. Demler, O., Pencina, M., D’Agostino, R.S.: Impact of correlation on predictive ability of biomarkers. Stat. Med. 32, 4196–421 (2013)

    Article  MathSciNet  Google Scholar 

  12. Montvida, O., Klawonn, F.: Relative cost curves: An alternative to AUC and an extension to 3-class problems. Kybernetika 50, 647–660 (2014)

    MathSciNet  MATH  Google Scholar 

  13. Hand, D., Till, R.: A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach. Learn. 45, 171–186 (2001)

    Article  MATH  Google Scholar 

  14. Li, J., Fine, J.: ROC analysis with multiple classes and multiple tests: methodology and its application in microarray studies. Biostatistics 9, 566–576 (2008)

    Article  MATH  Google Scholar 

  15. Li, J., Fine, J.: Nonparametric and semiparametric estimation of the three way receiver operating characteristic surface. J. Stat. Plan. Infer. 139, 4133–4142 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  16. Hernández-Orallo, J.: Pattern Recogn. ROC curves for regression 46(12), 3395–3411 (2013)

    Google Scholar 

  17. Novoselova, N., Della Beffa, C., Wang, J., Li, J., Pessler, F., Klawonn, F.: HUM calculator and HUM package for R: easy-to-use software tools for multicategory receiver operating characteristic analysis. Bioinformatics 30, 1635–1636 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frank Klawonn .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Klawonn, F., Wang, J., Koch, I., Eberhard, J., Omar, M. (2016). HAUCA Curves for the Evaluation of Biomarker Pilot Studies with Small Sample Sizes and Large Numbers of Features. In: Boström, H., Knobbe, A., Soares, C., Papapetrou, P. (eds) Advances in Intelligent Data Analysis XV. IDA 2016. Lecture Notes in Computer Science(), vol 9897. Springer, Cham. https://doi.org/10.1007/978-3-319-46349-0_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46349-0_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46348-3

  • Online ISBN: 978-3-319-46349-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics