Skip to main content

A Scenario Implementation in R for SubtypeDiscovery Examplified on Chemoinformatics Data

  • Conference paper
Book cover Leveraging Applications of Formal Methods, Verification and Validation (ISoLA 2008)

Abstract

We developed a methodology that both facilitates and enhances the search for homogeneous subtypes in data. We applied this methodology to medical research on Osteoarthritis and Parkinson’s Disease and to chemoinformatics research on the chemical structure of molecule profiles. We release this methodology as the R SubtypeDiscovery package to enable reproducibility of our analyses. In this paper, we present the package implementation and we illustrate its output on molecular data from chemoinformatics. Our methodology includes different techniques to process the data, a computational approach repeating data modelling to select for a number of subtypes or a type of model, and additional methods to characterize, compare and evaluate the top ranking models. Therefore, this methodology does not solely cluster data but it also produces a complete set of results to conduct a subtype discovery analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Colas, F., Meulenbelt, I., Houwing-Duistermaat, J., van Rooden, S., Visser, M., Marinus, H., van Hilten, B., Slagboom, P.E., Kok, J.N.: Stability of clusters for different time adjustments in complex disease research. In: 30th Annual International IEEE EMBS Conference (EMBC 2008), Vancouver, British Columbia, Canada (August 2008)

    Google Scholar 

  2. Meulenbelt, I.: Genetic predisposing factors of osteoarthritis. PhD thesis, Universiteit van Leiden (1997)

    Google Scholar 

  3. Riyazi, N.: Familial osteoarthritis, risk factors and determinants of outcome. PhD thesis, Universiteit van Leiden (2006)

    Google Scholar 

  4. Neurology Department: SCales for Outcomes in PArkinson’s Disease-PROfiling PARKinson’s Disease. Leiden University Medical Center, Leiden, The Netherlands

    Google Scholar 

  5. Cannon, E.O., Nigsch, F., Mitchell, J.B.O.: A novel hybrid ultrafast shape descriptor method for use in virtual screening. Chemistry Central Journal 2 (2008)

    Google Scholar 

  6. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, Heidelberg (2001)

    MATH  Google Scholar 

  7. Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  8. Sneath, P.H.A., Sokal, R.R.: Numerical Taxonomy, The Principles and Practice of Numerical Classification. Books in Biology. W. H. Freeman and Company, New York (1973)

    MATH  Google Scholar 

  9. Fraley, C., Raftery, A.E.: MCLUST: Software for model-based cluster analysis. Journal of Classification 16, 297–306 (1999)

    Article  MATH  Google Scholar 

  10. Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis and density estimation. Journal of the American Statistical Association 97, 611–631 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  11. Fraley, C., Raftery, A.E.: Enhanced software for model-based clustering, density estimation, and discriminant analysis: MCLUST. Journal of Classification 20, 263–286 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  12. Fraley, C., Raftery, A.E.: MCLUST version 3 for R: Normal mixture modeling and model-based clustering. Technical Report 504, University of Washington, Department of Statistics (September 2006)

    Google Scholar 

  13. Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  14. Kass, R.E., Raftery, A.E.: Bayes factors. Journal of the American Statistical Association 90(430) (1995)

    Google Scholar 

  15. Tukey, J.W.: Exploratory Data Analysis. Addison-Wesley, Reading (1977)

    MATH  Google Scholar 

  16. Tufte, E.R.: The Visual Display of Quantitative Information. Graphics Press, Cheshire (1983)

    Google Scholar 

  17. Tufte, E.R.: Envisioning Information. Graphics Press, Cheshire (1990)

    Google Scholar 

  18. Brewer, C.A.: 7. In: Color Use Guidelines for Mapping and Visualization, pp. 123–147. Elsevier Science, Tarrytown (1994)

    Google Scholar 

  19. Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proceedings of National Academy of Science USA 95, 11863–14868 (1998)

    Article  Google Scholar 

  20. Inselberg, A.: The plane with parallel coordinates. The Visual Computer 1(2), 69–91 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  21. R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2008) ISBN 3-900051-07-0

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Colas, F. et al. (2008). A Scenario Implementation in R for SubtypeDiscovery Examplified on Chemoinformatics Data. In: Margaria, T., Steffen, B. (eds) Leveraging Applications of Formal Methods, Verification and Validation. ISoLA 2008. Communications in Computer and Information Science, vol 17. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88479-8_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88479-8_48

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88478-1

  • Online ISBN: 978-3-540-88479-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics