Skip to main content
Log in

Abstract

Recent advances in machine learning and pattern recognition methods provide new analytical tools to explore high dimensional gene expression microarray data. Our data mining software, VISual Data Analyzer for cluster discovery (VISDA), reveals many distinguishing patterns among gene expression profiles, which are responsible for the cell's phenotypes. The model-supported exploration of high-dimensional data space is achieved through two complementary schemes: dimensionality reduction by discriminatory data projection and cluster decomposition by soft data clustering. Reducing dimensionality generates the visualization of the complete data set at the top level. This data set is then partitioned into subclusters that can consequently be visualized at lower levels and if necessary partitioned again. In this paper, three different algorithms are evaluated in their abilities to reduce dimensionality and to visualize data sets: Principal Component Analysis (PCA), Discriminatory Component Analysis (DCA), and Projection Pursuit Method (PPM). The partitioning into subclusters uses the Expectation-Maximization (EM) algorithm and the hierarchical normal mixture model that is selected by the user and verified “optimally” by the Minimum Description Length (MDL) criterion. These approaches produce different visualizations that are compared against known phenotypes from the microarray experiments. Overall, these algorithms and user-selected models explore the high dimensional data where standard analyses may not be sufficient.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. D.J. Duggan, M.L. Bittner, Y. Chen, P. Meltzer, and J.M. Trent, "Expression Profiling Using cDNA Microarrays," Nature Genetics, vol. 21, 1999, pp. 10-14.

    Article  Google Scholar 

  2. U. Scherf, D.T. Ross, M. Waltham, L.H. Smith, J.K. Lee, L. Tanabe, K.W. Kohn, W.C. Reinhold, T.G. Myers, D.T. Andrews, D.A. Scudiero, M.B. Eisen, E.A. Sausville, Y. Pommier. D. Botstein, P.O. Brown, and J.N. Weinstein, "A Gene Expression Database for the Molecular Pharmacology of Cancer," Nature Genetics, vol. 24, 2000, pp. 236-244.

    Article  Google Scholar 

  3. M. Bittner, P. Meltzer, Y. Chen, Y. Jiang, E. Seftor, M. Hendrix, M. Radmacher, R. Simon, Z. Yakhinl, A. Ben-Dor, N. Sampas, E. Dougherty, E. Wang, F. Marincola, C. Gooden, J. Lueders, A. Glatfelter, P. Pollock, J. Carpten, E. Gillanders, D. Leja, K. Dietrich, C. Beaudry, M. Berens, D. Alberts, V. Sondak, N. Hayward, and J. Trent, "Molecular Classification of Cutaneous Malignant Melanoma by Gene Expression Profiling," Nature, vol. 406, no. 3, 2000, pp. 536-540.

    Article  Google Scholar 

  4. H. Zhang, C.-Y. Yu, B. Singer, and M. Xiong, "Recursive Partitioning for Tumor Classification with Gene Expression Microarray Data," Proc. Natl. Acad. Sci., vol. 98, no. 12, 2001, pp. 6730-6735.

    Article  Google Scholar 

  5. T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander, "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring," Science, vol. 286, 1999, pp. 531-537.

    Article  Google Scholar 

  6. J. Khan, J.S. Wei, M. Rigner, L.H. Saal, M. Lananyi, F. Westermann, F. Berthold, M. Schwab, C.R. Antonescu, C. Peterson, and P.S. Meltzer, "Classification and Diagnostic Prediction of Cancers Using Gene Expression Profiling and Artificial Neural Networks," Nature Medicine, vol. 7, no. 6, 2001, pp. 673-679.

    Article  Google Scholar 

  7. P. Tamayo, D. Slonim, J. Msirov et al., "Interpreting Pattern of Gene Expression with Self-Organizing Maps: Methods and Application to Hematopoietic Differentiation," Proc. Natl. Acad. Sci., vol. 96, 1999, pp. 2907-2912.

    Article  Google Scholar 

  8. E. Hartuv, A.O. Schmitt, L. Lange, S. Meier-Ewert, H. Lehrach, and R. Schamir, "An Algorithm for Clustering cDNA Fingerprints," Genomics, vol. 66, 2000, pp. 249-256.

    Article  Google Scholar 

  9. A. Ben-Hur, D. Horn, H.T. Siegelmann, and V. Vapnik, "Support Vector Clustering," J. Machine Learning Research, vol. 2, 2001, pp. 125-137. 270 Wang et al.

    Google Scholar 

  10. Y. Wang, L. Luo, M.T. Freedman, and S.-Y. Kung, "Probabilistic Principal Component Subspaces: A Hierarchical Finite Mixture Model for DataVisualization," IEEE Trans. Neural Nets, vol. 11, no. 3,2000, pp. 625-636.

    Article  Google Scholar 

  11. Y. Wang, J. Lu, and Z. Wang et al., "Discriminative Mining of Gene Microarray Data," in Proc. of IEEE Neural Network for Signal Processing Workshop, Sept. 2001, pp. 23-32.

  12. S.T. Roweis and L.K. Saul, "Nonlinear Dimensionality Reduction by Locally Linear Embedding," Science, vol. 290, 2000, pp. 2323-2326.

    Article  Google Scholar 

  13. R. Lotlikar and R. Kothari, "Fractional-Step Dimensionality Reduction," IEEE Trans. Pattern Anal. Machine Intell., vol. 22, no. 6, 2000, pp. 623-627.

    Article  Google Scholar 

  14. G.E. Hinton, P. Dayan, and M. Revow, "Modeling the Manifolds of Images of Handwritten Digits," IEEE Trans. Neural Net., vol. 8, no. 1, 1997, pp. 65-74.

    Article  Google Scholar 

  15. N. Kambhatla and T.K. Leen, "Dimension Reduction by Local Principal Component Analysis," Neural Computation, vol. 9, no. 7, 1997, pp. 1493-1516.

    Article  Google Scholar 

  16. M.E. Tipping and C.M. Bishop, "Mixtures of Probabilistic Principal Component Analyzers," Neural Computation, vol. 11, 1999, pp. 443-482.

    Article  Google Scholar 

  17. C.M. Bishop and M.E. Tipping, "A Hierarchical Latent Variable Model for Data Visualization," IEEE Trans. Pattern Anal. Machine Intell., vol. 20, no. 3, 1998, pp. 282-293.

    Article  Google Scholar 

  18. S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd ed., Upper Saddle River, New Jersey: Prentice-Hall, Inc., 1999.

    MATH  Google Scholar 

  19. D.M. Titterington, A.F.M. Smith, and U.E. Markov, Statistical Analysis of Finite Mixture Distributions, New York: JohnWiley, 1985.

    MATH  Google Scholar 

  20. E. Mjolsness and D. DeCoste, "Machine Learning for Science: State of the Art and Future Prospects," Science, vol. 293, 2001, pp. 2051-2055.

    Article  Google Scholar 

  21. J. Rissanen, "Modeling by Shortest Data Description," Automatica, vol. 14, 1978, pp. 465-471.

    Article  MATH  Google Scholar 

  22. A.K. Jain, R.P.W. Duin, and J. Mao, "Statistical Pattern Recognition: A Review," IEEE Trans. Pattern Anal. Machine Intell., vol. 22, no. 1, 2000, pp. 4-37.

    Article  Google Scholar 

  23. J.H. Friedman, "Exploratory Projection Pursuit," J. Ame. Stat. Asso., vol. 82, no. 397, 1987, pp. 249-266.

    Article  MATH  Google Scholar 

  24. A. Hyvarinen and E. Oja, "Independent Component Analysis: Algorithms and Applications," Neural Networks, vol. 13, 2000, pp. 411-430.

    Article  Google Scholar 

  25. B. Ripley,Pattern Recognition and Neural Networks, Cambridge University Press, 1996.

  26. Y. Wang, S.-H. Lin, H. Li, and S.-Y. Kung, "Data Mapping by Probabilistic Modular Networks and Information-Theoretic Criteria," IEEE Trans. Signal Processing, vol. 46, no. 12, 1998, pp. 3378-3397.

    Article  Google Scholar 

  27. K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd ed., New York: Academic Press, 1990.

    MATH  Google Scholar 

  28. S.-Y. Kung, Principal Component Neural Network, New York: Wiley, 1996.

    Google Scholar 

  29. R.N. Bracewell, Two-Dimensional Imaging, Prentice-Hall, Inc., 1995.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Z., Wang, Y., Lu, J. et al. Discriminatory Mining of Gene Expression Microarray Data. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 35, 255–272 (2003). https://doi.org/10.1023/B:VLSI.0000003024.13494.40

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:VLSI.0000003024.13494.40

Navigation