Abstract
The morbidity rate of cancer victims varies greatly for similar patients who receive similar treatments. It is hypothesized that this variation can be explained by the genetic heterogeneity of the disease. DNA Microarrays, which can simultaneously measure the expression level of thousands of different genes, have been successfully used to identify such genetic differences. However, microarray data typically has a large number of features and relatively few observations, meaning that conventional machine learning tools can fail when applied to such data. We describe a novel procedure called "nearest shrunken centroids" that has successfully detected clinically relevant genetic differences in cancer patients. This procedure has the potential to become a powerful tool for diagnosing and treating cancer.
- Ash A. Alizadeh, Michael B. Eisen, R. Eric Davis, Chi Ma, Izidore S. Lossos, Andreas Rosenwald, Jennifer C. Boldrick, Hajeer Sbet, Truc Tran, Xin Yu, John I. Powell, Lming Yang, Gerald E. Marti, Troy Moore, James Hudson, Jr., Lishen Lu, David B. Lewis, Robert Tibshirani, Gavin Sherlock, Wing C. Chan, Timothy C. Greiner, Dennis D. Weisenburger, James O. Armitage, Roger Warnke, Ronald Levy, Wyndham Wilson, Michael R. Grever, John C. Byrd, David Botstein, Patrick O. Brown, and Louis M. Staudt, Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling, Nature 403 (2000), 503--511.Google ScholarCross Ref
- Michael B. Eisen, Paul T. Spellman, Patrick O. Brown, and David Botstein, Cluster analysis and display of genome-wide expression patterns, Proceedings of the National Academy of Sciences 95 (1998), 14863--14868.Google ScholarCross Ref
- T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield, and E. S. Lander, Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, Science 286 (1999), 531--536.Google ScholarCross Ref
- A. D. Gordon, Classification, Chapman & Hall, Boca Raton, LA, 1999.Google Scholar
- Trevor Hastie, Robert Tibshirani, David Botstein, and Patrick Brown, Supervised harvesting of expression trees, Genome Biology 2(1) (2001), 1--12.Google ScholarCross Ref
- Trevor Hastie, Robert Tibshirani, and Jerome Friedman, The elements of statistical learning: data mining, inference and prediction, Springer-Verlag, New York, NY, 2001.Google Scholar
- Ingrid Hedenfalk, David Duggan, Yidong Chen, Michael Radmacher, Michael Bittner, Richard Simon, Paul Meltzer, Barry Gusterson, Manel Esteller, Mark Raffeld, Zohar Yakhini, Amir Ben-Dor, Edward Dougherty, Juha Kononen, Lukas Bubendorf, Wilfrid Fehrle, Stefania Pittaluga, Sofia Gruvberger, Niklas Loman, Oskar Johannsson, Håkan Olsson, Benjamin Wilfond, Guido Sauter, Olli-P. Kallioniemi, Ake Borg, and Jeffrey Trent, Gene-expression profiles in hereditary breast cancer, The New England Journal of Medicine 344 (2001), 539--548.Google ScholarCross Ref
- Javed Khan, Jun S. Wei, Markus Ringnér, Lao H. Saal, Marc Ladanyi, Frank Westernmann, Frank Berthold, Manfred Schwab, Cristina R. Antonescu, Carsten Peterson, and Paul S. Meltzer, Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks, Nature Medicine 7 (2001), 673--679.Google ScholarCross Ref
- Non-Hodgkin's Lymphoma Classification Project, A clinical evaluation of the international lymphoma study group classification of non-hodgkin's lymphoma, Blood 89 (1997), 3909--3918.Google Scholar
- Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, and Gilbert Chu, Diagnosis of multiple cancer types by shrunken centroids of gene expression, Proceedings of the National Academy of Sciences 99 (2002), 6567--6572.Google ScholarCross Ref
- Julie M. Vose, Current approaches to the management of non-hodgkin's lymphoma, Seminars in Oncology 25 (1998), 483--491.Google Scholar
Recommendations
Mining pathway signatures from microarray data and relevant biological knowledge
High-throughput technologies such as DNA microarray are in the process of revolutionising the way modern biological research is being done. Bioinformatics tools are becoming increasingly important to assist biomedical scientists in their quest in ...
Matrix factorisation methods applied in microarray data analysis
Numerous methods have been applied to microarray data to group genes into clusters that show similar expression patterns. These methods assign each gene to a single group, which does not reflect the widely held view among biologists that most, if not ...
Handling missing DNA microarray data by kriging estimators
Microarray gene expression data provide life science researchers with much more sensitive and detailed information about gene expression patterns than conventional methodologies for the purpose of facilitating gene recognition efforts. However, due to ...
Comments