Skip to main content

Active Mining Discriminative Gene Sets

(Invited)

  • Conference paper
  • 1191 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4029))

Abstract

Searching for good discriminative gene sets (DGSs) in microarray data is important for many problems, such as precise cancer diagnosis, correct treatment selection, and drug discovery. Small and good DGSs can help researchers eliminate “irrelavent” genes and focus on “critical” genes that may be used as biomarkers or that are related to the development of cancers. In addition, small DGSs will not impose demanding requirements to classifiers, e.g., high-speed CPUs, large memorys, etc. Furthermore, if the DGSs are used as diagnostic measures in the future, small DGSs will simplify the test and therefore reduce the cost. Here, we propose an algorithm of searching for DGSs, which we call active mining discriminative gene sets (AM-DGS). The searching scheme of the AM-DGS is as follows: the gene with a large t-statistic is assigned as a seed, i.e., the first feature of the DGS. We classify the samples in a data set using a support vector machine (SVM). Next, we add the gene with the greatest power to correct the misclassified samples into the DGS, that is the gene with the largest t-statistic evaluated with only the mis-classified samples is added. We keep on adding genes into the DGS according to the SVM’s mis-classified data until no error appears or overfitting occurs. We tested the proposed method with the well-known leukemia data set. In this data set, our method obtained two 2-gene DGSs that achieved 94.1% testing accuracy and a 4-gene DGS that achieved 97.1% testing accuracy. This result showed that our method obtained better accuracy with much smaller DGSs compared to 3 widely used methods, i.e., T-statistics, F-statistics, and SVM-based recursive feature elimination (SVM-RFE).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   139.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Guyon, I., Wecton, J., Barnhill, S., Vapnik, V.: Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning 46, 389–422 (2002)

    Article  MATH  Google Scholar 

  2. Mitra, P., Murthy, C.A., Pal, S.K.: A Probabilistic Active Support Vector Learning Algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 413–418 (2004)

    Article  Google Scholar 

  3. Tong, S., Koller, D.: Support Vector Machine Active Learning with Applications to Text Classification. Journal of Machine Learning Research 2, 45–66 (2002)

    Article  MATH  Google Scholar 

  4. Platt, J.C.: Sequential Minimum Optimization: A Fast Algorithm for Training Support Vector Machines. Microsoft Research, Cambridge, U.K., Technical Report (1998)

    Google Scholar 

  5. Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., et al.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537 (1999)

    Article  Google Scholar 

  6. Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays. Proc. Natl. Acad. Sci. USA. 96, 6745–6750 (1999)

    Article  Google Scholar 

  7. Wang, Y., Makedon, F., Ford, J., Pearlman, J.: Hykgene: a Hybrid Approach for Selecting Marker Genes for Phenotype Classification Using Microarray Gene Expression Data. Bioinformatics 21, 1530–1537 (2005)

    Article  Google Scholar 

  8. Li, L., Weinberg, C.R., Darden, T.A., Pedersen, L.G.: Gene Selection for Sample Classification Based on Gene Expression Data: Study of Sensitivity to Choice of Parameters of the GA/KNN Method. Bioinformaitcs 17, 1131–1142 (2001)

    Article  Google Scholar 

  9. Cho, J.H., Lee, D., Park, J.H., Lee, I.B.: Gene Selection and Classification from Microarray Data Using Kernel Machine. FEBS Letters 571, 93–98 (2004)

    Article  Google Scholar 

  10. Li, J., Wong, L.: Identifying Good Diagnostic Gene Groups from Gene Expressin Profiles Using the Concept of Emerging Patterns. Bioinformatics 18, 725–734 (2002)

    Article  Google Scholar 

  11. Lai, Y., Wu, B., Chen, L., Zhao, H.: Statistical Method for Identifying Differential Gene-Gene Coexpression Patterns. Bioinformatics 21, 1565–1571 (2005)

    Article  Google Scholar 

  12. Broet, P., Lewin, A., Richardson, S., Dalmasso, C., Magdelenat, H.: A Mixture Model-Based Strategy for Selecting Sets of Genes in Multiclass Response Microarray Experiments. Bioinformatics 20, 2562–2571 (2004)

    Article  Google Scholar 

  13. Alizadeh, A.A., Eisen, M.B., Davis, R.E., Ma, C., Lossos, I.S., Rosenwald, A., Boldrick, J.C., Sabet, H., Tran, T., Yu, X., et al.: Distinct Types of Diffuse Large B-Cell Lymphoma Identified by Gene Expression Profiling. Nature 403, 503–511 (2000)

    Article  Google Scholar 

  14. Khan, J.M., Wei, J.S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C.R., Peterson, C., et al.: Classification and Diagnostic Prediction of Cancers Using Gene Expression Profiling and Artificial Neural Networks. Nature Medicine 7, 673–679 (2001)

    Article  Google Scholar 

  15. Deutsch, J.M.: Evolutionary Algorithms for Finding Optimal Gene Sets in Microarray Prediction. Bioinformatics 19, 45–52 (2003)

    Article  Google Scholar 

  16. Devore, J., Peck, R.: Statistics: the Exploration and Analysis of Data, 3rd edn. Duxbury Press, Pacific Grove (1997)

    Google Scholar 

  17. Xing, E.P., Jordan, M.I., Karp, R.M.: Feature Selection for High-Dimensional Genomic Microarray Data. In: Proc. of the 18th International Conference on Machine Learning, pp. 601–608. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  18. Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)

    MATH  Google Scholar 

  19. Wang, L.P. (ed.): Support Vector Machines: Theory and Applications. Springer, Berlin (2005)

    MATH  Google Scholar 

  20. Devijver, P., Kittler, J.: Pattern Recognition: a Statistical Approach. Prentice Hall, London (1982)

    MATH  Google Scholar 

  21. Fu, X., Wang, L.P.: Data Dimensionality Reduction with Application to Simplifying RBF Network Structure and Improving Classification Performance. IEEE Trans. on Systems, Man, and Cybernetics-Part b: Cybernetics 33, 399–409 (2003)

    Article  Google Scholar 

  22. Ji, S., Krishnapuram, B., Carin, L.: Hidden Markov Models and Its Application to Active Learning. IEEE Trans. on Pattern Analysis and Machine Intelligence 28, 522–532 (2006)

    Article  Google Scholar 

  23. Riccardi, G., Hakkani-Tur, D.: Active Learning: Theory and Application to Automatic Speech Recognition. IEEE Trans. on Speech and Audio Processing 13, 504–511 (2005)

    Article  Google Scholar 

  24. Liu, X., Krishnan, A., Mondry, A.: An Entropy-Based Gene Selection Method for Cancer Classification Using Microarray Data. BMC Bioinformatics 6, 76 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chu, F., Wang, L. (2006). Active Mining Discriminative Gene Sets. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Żurada, J.M. (eds) Artificial Intelligence and Soft Computing – ICAISC 2006. ICAISC 2006. Lecture Notes in Computer Science(), vol 4029. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11785231_92

Download citation

  • DOI: https://doi.org/10.1007/11785231_92

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-35748-3

  • Online ISBN: 978-3-540-35750-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics