Abstract
We have found one reason why AdaBoost tends not to perform well on gene expression data, and identified simple modifications that improve its ability to find accurate class prediction rules. These modifications appear especially to be needed when there is a strong association between expression profiles and class designations. Cross-validation analysis of six microarray datasets with different characteristics suggests that, suitably modified, boosting provides competitive classification accuracy in general.
Sometimes the goal in a microarray analysis is to find a class prediction rule that is not only accurate, but that depends on the level of expression of few genes. Because boosting makes an effort to find genes that are complementary sources of evidence of the correct classification of a tissue sample, it appears especially useful for such gene-efficient class prediction. This appears particularly to be true when there is a strong association between expression profiles and class designations, which is often the case for example when comparing tumor and normal samples.
Article PDF
Similar content being viewed by others
References
Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., & Levine, A. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon cancer tissues probed by oligonucleotide arrays. Cell Biology, 96, 6745–6750.
Ambroise, C., & McLachlan, G. J. (2002). Selection bias in gene extraction on the basis of microarray gene-expression data. Proc. Natl. Acad. Sci. USA, 99:10, 6562–6566.
Anthony, M., & Bartlett, P. L. (1999). Neural Network Learning: Theoretical Foundations. Cambridge University Press.
Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., & Yakhini, Z. (2000). Tissue classification with gene expression profiles. Journal of Computational Biology, 7, 559–584.
Breiman, L. (1998). Arcing classifiers. The Annals of Statistics.
Dubhashi, D., & Ranjan, D. (1998). Balls and bins: A study in negative dependence. Random Structures and Algorithms, 13:2, 99–124.
Duda, R. O., & Hart, P. E. (1973). Pattern Classification and Scene Analysis. Wiley.
Dudoit, S., Fridlyand, J., & Speed, T. P. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 97:457, 77–87.
Freund, Y. (1995). Boosting a weak learning algorithm by majority. Information and Computation, 121:2, 256–285.
Freund, Y., & Schapire, R. (1996). Experiments with a new boosting algorithm. In Proceedings of the Thirteenth International Conference on Machine Learning. J. Japan. Soc. for Artif. Intel., 14:5, 771–780.
Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55:1, 119–139.
Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D., & Lander, E. S. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286, 531–537.
Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46: 1–3, 389–422.
Haussler, D. (1992). Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 100:1, 78–150.
Haussler, D., Littlestone, N., & Warmuth, M. K. (1994). Predicting {0, 1}-functions on randomly drawn points. Information and Computation, 115:2, 129–161.
Iba, W., & Langley, P. (1992). Induction of one-level decision trees. In Proc. of the 9th International Workshop on Machine Learning.
Joachims, T. (1998). Making large-scale support vector machines learning practical. In B. S. Olkopf, C. Burges, & A. Smola (Eds.), Advances in Kernel Methods: Support vector machines, pp. 169–184.
Kearns, M., Mansour, Y., Ng, A. Y., & Ron, D. (1997). An experimental and theoretical comparison of model selection methods. Machine Learning, 27, 7–50.
Kivinen, J., & Warmuth, M. (1999). Boosting as entropy projection. In Proc. COLT'99.
Li, Y., Campbell, C., & Tipping, M. (2002). Bayesian automatic relevance determination algorithms for classifying gene expression data. Bioinformatics, 18:10, 1332–1339.
Li, Y., Long, P. M., & Srinivasan, A. (2001). Improved bounds on the sample complexity of learning. Journal of Computer and System Sciences, 62:3, 516–527.
Mason, L., Bartlett, P. L., & Baxter, J. (2000). Improved generalization through explicit optimization of margins. Machine Learning, 38:3, 243–255.
Miller, L. D., Long, P. M., Wong, L., Mukherjee, S., McShane, L. M., & Liu, E. T. (2002). Optimal gene expression analysis by microarrays. Cancer Cell, 2:5, 353–361.
Panchenko, D., & Koltchinskii, V. (2002). Empirical margin distributions and bounding the generalization error of combined classifiers. Annals of Statistics, 30:1.
Parker, C. W. (1990). Immunoassays. In M. P. Deutscher (Ed.), Guide to Protein Purification. Academic Press.
Pomeroy, S. L., Tamayo, P., Gaasenbeek, M., Sturla, L. M., Angelo, M., McLaughlin, M. E., Kim, J. Y., Goumnerova, L. C., Black, P. M., Lau, C., Allen, J. C., Zagzag, D., Olson, J. M., Curran, T., Wetmore, C., Biegel, J. A., Poggio, T., Mukherjee, S., Rifkin, R., Califano, A., Stolovitzky, G., Louis, D. N., Mesirov, J. P., Lander, E. S., & Golub, T. R. (2002). Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature, 415, 436–442.
Rätsch, G., Onoda, T., & Müller, K.-R. (2001). Soft margins for adaBoost. Machine Learning, 42:3, 287–320. Also NeuroCOLT Technical Report NC-TR-1998-021. In press.
Schapire, R., & Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37:3, 297–336.
Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5:2, 197–226.
Schapire, R. E., Freund, Y., Bartlett, P., & Lee, W. S. (1998). Boosting the Margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26:5, 1651–1686.
Shawe-Taylor, J., Bartlett, P., Williamson, R., & Anthony, M. (1998). Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44:5, 1926–1940.
Talagrand, M. (1994). Sharper bounds for Gaussian and empirical processes. Annals of Probability, 22, 28–76.
Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM, 27:11, 1134–1142.
Vapnik, V. (1998). Statistical Learning Theory. New York.
Vapnik, V. N. (1982). Estimation of Dependencies based on Empirical Data. Springer Verlag.
Vapnik, V. N. (1989). Inductive principles of the search for empirical dependences (methods based on weak convergence of probability measures). In Proceedings of the 1989 Workshop on Computational Learning Theory.
Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. Springer
Vapnik, V. N., & Chervonenkis, A. Y. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16:2, 264–280.
West, M., Blanchette, C., Dressman, H., Huang, E., Ishida, S., Spang, R., Zuzan, H., J. A. O., Jr., Marks, J. R., & Nevins, J. R. (2001). Predicting the clinical status of human breast cancer by using gene expression profiles. Proc. Natl. Acad. Sci. USA, 98:20, 11462–11467.
Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., & Vapnik, V. (2000). Feature selection for SVMs. In NIPS, pp. 668–674.
Xing, E., Jordan, M., & Karp, R. (2001). Feature selection for high-dimensional genomic microarray data. Eigh-teenth International Conference on Machine Learning.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Long, P.M., Vega, V.B. Boosting and Microarray Data. Machine Learning 52, 31–44 (2003). https://doi.org/10.1023/A:1023937123600
Issue Date:
DOI: https://doi.org/10.1023/A:1023937123600