Measuring and visualizing the stability of biomarker selection techniques

Lausser, Ludwig; Müssel, Christoph; Maucher, Markus; Kestler, Hans A.

doi:10.1007/s00180-011-0284-y

Measuring and visualizing the stability of biomarker selection techniques

Original Paper
Published: 05 October 2011

Volume 28, pages 51–65, (2013)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Ludwig Lausser¹,
Christoph Müssel¹,
Markus Maucher¹ &
…
Hans A. Kestler¹

422 Accesses
26 Citations
Explore all metrics

Abstract

Feature selection is an essential step when dealing with high-dimensional data. In a diagnostic setting, marker genes have to be selected for specialized low-dimensional gene expression assays. A meaningful biomarker selection is expected to produce stable results in different resampling settings. We define an index to quantify stability and introduce a statistical testing procedure for stability. We also present new methods of visualizing stability and associating it with the accuracy of a subsequent classification process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Abeel T, Helleputte T, Vande Peer Y, Dupont P, Saeys Y (2010) Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26(3): 392–398
Article Google Scholar
Bishop CM (1995) Neural networks for pattern recognition, 9th edn. Oxford University Press, Oxford
Google Scholar
Bittner M, Meltzer P, Chen Y, Jiang Y, Seftor E, Hendrix M, Radmacher M, Simon R, Yakhini Z, Ben-Dor A, Sampas N, Dougherty E, Wang E, Marincola F, Gooden C, Lueders J, Glatfelter A, Pollock P, Carpten J, Gillanders E, Leja D, Dietrich K, Beaudry C, Berens M, Alberts D, Sondak V (2000) Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 406(6795): 536–540
Article Google Scholar
Boulesteix AL, Slawski M (2009) Stability and aggregation of ranked gene lists. Brief Bioinform 10(5): 556–568
Article Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1): 5–32
Article MATH Google Scholar
Buchholz M, Kestler HA, Bauer A, Böck W, Rau B, Leder G, Kratzer W, Bommer M, Scarpa A, Schilling M, Adler G, Hoheisel JD, Gress TM (2005) Specialized DNA arrays for the differentiation of pancreatic tumors. Clin Cancer Res 11(22): 8048–8054
Article Google Scholar
Davis CA, Gerick F, Hintermair V, Friedel CC, Fundel K, Knffner R, Zimmer R (2006) Reliable gene signatures for microarray classification: assessment of stability and performance. Bioinformatics 22(19): 2356–2363
Article Google Scholar
Deb K (2004) Multi-objective optimization using evolutionary algorithms. Wiley, New York
Google Scholar
Ein-Dor L, Zuk O, Domany E (2006) Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. PNAS 103(15): 5923–5928
Article Google Scholar
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3: 1157–1182
MATH Google Scholar
Haury AC, Gestraud P, Vert JP (2011) The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures. http://arxiv.org/abs/1101.5008
Hirsch JE (2005) An index to quantify an individual’s scientific research output. PNAS 102(46): 16569–16572
Article Google Scholar
Iman RL, Conover WJ (1987) A measure of top-down correlation. Technometrics 29(3): 351–357
MATH Google Scholar
Jelizarow M, Guillemot V, Tenenhaus A, Strimmer K, Boulesteix AL (2010) Over-optimism in bioinformatics: an illustration. Bioinformatics 26(16): 1990–1998
Article Google Scholar
Kalousis A, Prados J, Hilario M (2006) Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl Inf Syst 12(1): 95–116
Article Google Scholar
Kendall MG, Babington Smith B (1939) The problem of m rankings. Ann Math Stat 10(3): 275–287
Article Google Scholar
Kira K, Rendell L (1992) A practical approach to feature selection. In: Sleeman D, Edwards P (eds) ML92: proceedings of the ninth international workshop on Machine learning. Morgan Kaufmann Publishers, San Francisco, pp 249–256
Google Scholar
Kraus JM, Kestler HA (2010) A highly efficient multi-core algorithm for clustering extremely large datasets. BMC Bioinform 11(1): 169
Article Google Scholar
Kraus JM, Müssel C, Palm G, Kestler HA (2011) Multi-objective selection for collecting cluster alternatives. Comput Stat 26(2): 341–353
Article Google Scholar
Kuncheva L (2007) A stability index for feature selection. In: Kropatsch W, Kampel M, Hanbury A (eds) Proceedings of the 25th international multi-conference on artificial intelligence and applications. ACTA Press, Anaheim, pp 390–395
Google Scholar
Křížek P, Kittler J, Hlaváč V (2007) Improving stability of feature selection methods. In: Kropatsch WG, Kampel M, Hanbury A (eds) Computer analysis of images and patterns, Lecture notes in computer science, vol 4673. Springer, Heidelberg, Germany, pp 929–936
Google Scholar
Lempel R, Moran S (2005) Rank-Stability and Rank-Similarity of Link-Based Web Ranking Algorithms in Authority-Connected Graphs. Inf Retr 8(2): 245–264
Article Google Scholar
Lustgarten JL, Gopalakrishnan V, Visweswaran S (2009) Measuring stability of feature selection in biomedical datasets. In: Proceedings of the AMIA Annual Symposium 2009, pp 406–410
Ma S (2006) Empirical study of supervised gene screening. BMC Bioinform 7: 537
Article Google Scholar
Meinshausen N (2010) Stability selection. J R Stat SocB 74(4): 417–473
Article MathSciNet Google Scholar
Michiels S, Koscielny S, Hill C (2005) Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet 365(9458): 488–492
Article Google Scholar
Novovičová J, Somol P, Pudil P (2009) A new measure of feature selection algorithms’ stability. In: Saygin Y, Yu JX, Kargupta H, Wang W, Ranka S, Yu P, Wu X (eds) Proceedings of the 2009 IEEE international conference on data mining workshops. IEEE Computer Society, Piscataway, pp 382–387
Chapter Google Scholar
Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66: 846–850
Article Google Scholar
Saeys Y, Abeel T, Peer Y (2008) Robust feature selection using ensemble feature selection techniques. In: Proceedings of the European conference on machine learning and knowledge discovery in databases—Part II. Springer, Heidelberg, Germany, pp 313–325
Simon R, Radmacher MD, Dobbin K, McShane LM (2003) Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J Natl Cancer Inst 95(1): 14–18
Article Google Scholar
Steyerberg EW (2009) Clinical prediction models. Overfitting and optimism in prediction models (Chap. 5). Springer, Heidelberg
Book Google Scholar
Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3: 583–617
MathSciNet Google Scholar
Vidmar G, Rode N (2007) Visualising concordance. Comput Stat 22(4): 499–509
Article MathSciNet MATH Google Scholar
Zucknick M, Richardson S, Stronach EA (2008) Comparing the characteristics of gene expression profiles derived by univariate and multivariate classification methods. Stat Appl Genet Mol Biol 7(1): 7
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Research group Bioinformatics and Systems Biology, Institute of Neural Information Processing, University of Ulm, 89069, Ulm, Germany
Ludwig Lausser, Christoph Müssel, Markus Maucher & Hans A. Kestler

Authors

Ludwig Lausser
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Müssel
View author publications
You can also search for this author in PubMed Google Scholar
Markus Maucher
View author publications
You can also search for this author in PubMed Google Scholar
Hans A. Kestler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hans A. Kestler.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lausser, L., Müssel, C., Maucher, M. et al. Measuring and visualizing the stability of biomarker selection techniques. Comput Stat 28, 51–65 (2013). https://doi.org/10.1007/s00180-011-0284-y

Download citation

Received: 15 October 2010
Accepted: 23 September 2011
Published: 05 October 2011
Issue Date: February 2013
DOI: https://doi.org/10.1007/s00180-011-0284-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Measuring and visualizing the stability of biomarker selection techniques

Abstract

Access this article

Similar content being viewed by others

Introduction to Machine Learning

Feature selection techniques for machine learning: a survey of more than two decades of research

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Measuring and visualizing the stability of biomarker selection techniques

Abstract

Access this article

Similar content being viewed by others

Introduction to Machine Learning

Feature selection techniques for machine learning: a survey of more than two decades of research

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation