Skip to main content

On the Combination of Dissimilarities for Gene Expression Data Analysis

  • Conference paper
  • 1876 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4669))

Abstract

DNA Microarray technology allows us to monitor the expression level of thousands of genes simultaneously. This technique has become a relevant tool to identify different types of cancer.

Several machine learning techniques such as the Support Vector Machines (SVM) have been proposed to this aim. However, common SVM algorithms are based on Euclidean distances which do not reflect accurately the proximities among the sample profiles. The SVM has been extended to work with non-Euclidean dissimilarities. However, no dissimilarity can be considered superior to the others because each one reflects different features of the data.

In this paper, we propose to combine several Support Vector Machines that are based on different dissimilarities to improve the performance of classifiers based on a single measure. The experimental results suggest that our method reduces the misclassification errors of classifiers based on a single dissimilarity and a widely used combination strategy such as Bagging.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.C.: Re-designing distance functions and distance-based applications for high dimensional applications. In: Proc. of the ACM International Conference on Management of Data and Symposium on Principles of Database Systems (SIGMOD-PODS), vol. 1, pp. 13–18 (March 2001)

    Google Scholar 

  2. Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning 36, 105–139 (1999)

    Article  Google Scholar 

  3. Braga-Neto, U., Dougherty, E.: Is cross-validation valid for small-sample microarray classification? Bioinformatics 20(3), 374–380 (2004)

    Article  Google Scholar 

  4. Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)

    MATH  Google Scholar 

  5. Cox, T., Cox, M.: Multidimensional Scaling, 2nd ed. New York: Chapman & Hall/CRC Press (2001)

    Google Scholar 

  6. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2000)

    Google Scholar 

  7. Drãghici, S.: Data Analysis Tools for DNA Microarrays. Chapman & Hall/CRC Press, New York (2003)

    Google Scholar 

  8. Dudoit, S., Fridlyand, J., Speed, T.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association 97, 77–87 (2002)

    Article  MATH  Google Scholar 

  9. Furey, T., Cristianini, N., Duffy, N., Bednarski, D., Schummer, M., Haussler, D.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10), 906–914 (2000)

    Article  Google Scholar 

  10. Gentleman, R., Carey, V., Huber, W., Irizarry, R., Dudoit, S.: Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer, Heidelberg (2006)

    Google Scholar 

  11. Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Downing, J., Caligiuri, M., Bloomfield, C., Lander, E.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286(15), 531–537 (1999)

    Article  Google Scholar 

  12. Golub, G.H., Loan, C.F.V.: Matrix Computations, 3rd edn. Johns Hopkins university press, Baltimore, Maryland, USA (1996)

    MATH  Google Scholar 

  13. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)

    Article  MATH  Google Scholar 

  14. Hinneburg, C.C.A.A., Keim, D.A.: What is the nearest neighbor in high dimensional spaces? In: Proc. of the International Conference on Database Theory (ICDT), pp. 506–515. Morgan Kaufmann, Cairo, Egypt (2000)

    Google Scholar 

  15. Jiang, D., Tang, C., Zhang, A.: Cluster analysis for gene expression data: A survey. IEEE Transactions on Knowledge and Data Engineering 16(11) (2004)

    Google Scholar 

  16. Kittler, J., Hatef, M., Duin, R., Matas, J.: On combining classifiers. IEEE Transactions on Neural Networks 20(3), 228–239 (1998)

    Google Scholar 

  17. Martín-Merino, M., Muñoz, A.: Self organizing map and sammon mapping for asymmetric proximities. Neurocomputing 63, 171–192 (2005)

    Article  Google Scholar 

  18. Martín-Merino, M., Muñoz, A.: A new sammon algorithm for sparse data visualization. In: International Conference on Pattern Recognition (ICPR), August 2004, pp. 477–481. IEEE Press, Cambridge (UK) (2004)

    Google Scholar 

  19. Molinaro, A., Simon, R., Pfeiffer, R.: Prediction error estimation: a comparison of resampling methods. Bioinformatics 21(15), 3301–3307 (2005)

    Article  Google Scholar 

  20. Pekalska, E., Paclick, P., Duin, R.: A generalized kernel approach to dissimilarity-based classification. Journal of Machine Learning Research 2, 175–211 (2001)

    Article  Google Scholar 

  21. Valentini, G., Dietterich, T.: Bias-variance analysis of support vector machines for the development of svm-based ensemble methods. Journal of Machine Learning Research 5, 725–775 (2004)

    Google Scholar 

  22. Vapnik, V.: Statistical Learning Theory. John Wiley & Sons, New York (1998)

    MATH  Google Scholar 

  23. West, M., Blanchette, C., Dressman, H., Huang, E., Ishida, S., Spang, R., Zuzan, H., Olson, J., Marks, J., Nevins, J.: Predicting the clinical status of human breast cancer by using gene expression profiles. PNAS 98(20) (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Joaquim Marques de Sá Luís A. Alexandre Włodzisław Duch Danilo Mandic

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Blanco, Á., Martín-Merino, M., De Las Rivas, J. (2007). On the Combination of Dissimilarities for Gene Expression Data Analysis. In: de Sá, J.M., Alexandre, L.A., Duch, W., Mandic, D. (eds) Artificial Neural Networks – ICANN 2007. ICANN 2007. Lecture Notes in Computer Science, vol 4669. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74695-9_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74695-9_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74693-5

  • Online ISBN: 978-3-540-74695-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics