A Scenario Implementation in R for SubtypeDiscovery Examplified on Chemoinformatics Data

Colas, Fabrice; Meulenbelt, Ingrid; Houwing-Duistermaat, Jeanine J.; Kloppenburg, Margreet; Watt, Iain; van Rooden, Stephanie M.; Visser, Martine; Marinus, Johan; Cannon, Edward O.; Bender, Andreas; van Hilten, Jacobus J.; Slagboom, P. Eline; Kok, Joost N.

doi:10.1007/978-3-540-88479-8_48

Fabrice Colas³,
Ingrid Meulenbelt⁴,
Jeanine J. Houwing-Duistermaat⁵,
Margreet Kloppenburg⁶,
Iain Watt⁷,
Stephanie M. van Rooden⁸,
Martine Visser⁸,
Johan Marinus⁸,
Edward O. Cannon⁹,
Andreas Bender¹⁰,
Jacobus J. van Hilten⁸,
P. Eline Slagboom⁴ &
…
Joost N. Kok^3,4

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 17))

Included in the following conference series:

International Symposium On Leveraging Applications of Formal Methods, Verification and Validation

975 Accesses
1 Citations

Abstract

We developed a methodology that both facilitates and enhances the search for homogeneous subtypes in data. We applied this methodology to medical research on Osteoarthritis and Parkinson’s Disease and to chemoinformatics research on the chemical structure of molecule profiles. We release this methodology as the R SubtypeDiscovery package to enable reproducibility of our analyses. In this paper, we present the package implementation and we illustrate its output on molecular data from chemoinformatics. Our methodology includes different techniques to process the data, a computational approach repeating data modelling to select for a number of subtypes or a type of model, and additional methods to characterize, compare and evaluate the top ranking models. Therefore, this methodology does not solely cluster data but it also produces a complete set of results to conduct a subtype discovery analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Colas, F., Meulenbelt, I., Houwing-Duistermaat, J., van Rooden, S., Visser, M., Marinus, H., van Hilten, B., Slagboom, P.E., Kok, J.N.: Stability of clusters for different time adjustments in complex disease research. In: 30th Annual International IEEE EMBS Conference (EMBC 2008), Vancouver, British Columbia, Canada (August 2008)
Google Scholar
Meulenbelt, I.: Genetic predisposing factors of osteoarthritis. PhD thesis, Universiteit van Leiden (1997)
Google Scholar
Riyazi, N.: Familial osteoarthritis, risk factors and determinants of outcome. PhD thesis, Universiteit van Leiden (2006)
Google Scholar
Neurology Department: SCales for Outcomes in PArkinson’s Disease-PROfiling PARKinson’s Disease. Leiden University Medical Center, Leiden, The Netherlands
Google Scholar
Cannon, E.O., Nigsch, F., Mitchell, J.B.O.: A novel hybrid ultrafast shape descriptor method for use in virtual screening. Chemistry Central Journal 2 (2008)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, Heidelberg (2001)
MATH Google Scholar
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)
Chapter Google Scholar
Sneath, P.H.A., Sokal, R.R.: Numerical Taxonomy, The Principles and Practice of Numerical Classification. Books in Biology. W. H. Freeman and Company, New York (1973)
MATH Google Scholar
Fraley, C., Raftery, A.E.: MCLUST: Software for model-based cluster analysis. Journal of Classification 16, 297–306 (1999)
Article MATH Google Scholar
Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis and density estimation. Journal of the American Statistical Association 97, 611–631 (2002)
Article MathSciNet MATH Google Scholar
Fraley, C., Raftery, A.E.: Enhanced software for model-based clustering, density estimation, and discriminant analysis: MCLUST. Journal of Classification 20, 263–286 (2003)
Article MathSciNet MATH Google Scholar
Fraley, C., Raftery, A.E.: MCLUST version 3 for R: Normal mixture modeling and model-based clustering. Technical Report 504, University of Washington, Department of Statistics (September 2006)
Google Scholar
Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)
Article MathSciNet MATH Google Scholar
Kass, R.E., Raftery, A.E.: Bayes factors. Journal of the American Statistical Association 90(430) (1995)
Google Scholar
Tukey, J.W.: Exploratory Data Analysis. Addison-Wesley, Reading (1977)
MATH Google Scholar
Tufte, E.R.: The Visual Display of Quantitative Information. Graphics Press, Cheshire (1983)
Google Scholar
Tufte, E.R.: Envisioning Information. Graphics Press, Cheshire (1990)
Google Scholar
Brewer, C.A.: 7. In: Color Use Guidelines for Mapping and Visualization, pp. 123–147. Elsevier Science, Tarrytown (1994)
Google Scholar
Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proceedings of National Academy of Science USA 95, 11863–14868 (1998)
Article Google Scholar
Inselberg, A.: The plane with parallel coordinates. The Visual Computer 1(2), 69–91 (1985)
Article MathSciNet MATH Google Scholar
R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2008) ISBN 3-900051-07-0
Google Scholar

Download references

Author information

Authors and Affiliations

LIACS, Leiden University, The Netherlands
Fabrice Colas & Joost N. Kok
MOLEPI, LUMC, The Netherlands
Ingrid Meulenbelt, P. Eline Slagboom & Joost N. Kok
MEDSTATS, LUMC, The Netherlands
Jeanine J. Houwing-Duistermaat
Rheumatology dept., LUMC, The Netherlands
Margreet Kloppenburg
Radiology dept., LUMC, The Netherlands
Iain Watt
Neurology dept., LUMC, The Netherlands
Stephanie M. van Rooden, Martine Visser, Johan Marinus & Jacobus J. van Hilten
UCMSI, University of Cambridge, United Kingdom
Edward O. Cannon
LACDR, Leiden University, The Netherlands
Andreas Bender

Authors

Fabrice Colas
View author publications
You can also search for this author in PubMed Google Scholar
Ingrid Meulenbelt
View author publications
You can also search for this author in PubMed Google Scholar
Jeanine J. Houwing-Duistermaat
View author publications
You can also search for this author in PubMed Google Scholar
Margreet Kloppenburg
View author publications
You can also search for this author in PubMed Google Scholar
Iain Watt
View author publications
You can also search for this author in PubMed Google Scholar
Stephanie M. van Rooden
View author publications
You can also search for this author in PubMed Google Scholar
Martine Visser
View author publications
You can also search for this author in PubMed Google Scholar
Johan Marinus
View author publications
You can also search for this author in PubMed Google Scholar
Edward O. Cannon
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Bender
View author publications
You can also search for this author in PubMed Google Scholar
Jacobus J. van Hilten
View author publications
You can also search for this author in PubMed Google Scholar
P. Eline Slagboom
View author publications
You can also search for this author in PubMed Google Scholar
Joost N. Kok
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Universität Potsdam, August-Bebel-Str. 89, 14482, Potsdam, Germany
Tiziana Margaria
Technische Universität Dortmund, Otto-Hahn-Str. 14, 44227, Dortmund, Germany
Bernhard Steffen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Colas, F. et al. (2008). A Scenario Implementation in R for SubtypeDiscovery Examplified on Chemoinformatics Data. In: Margaria, T., Steffen, B. (eds) Leveraging Applications of Formal Methods, Verification and Validation. ISoLA 2008. Communications in Computer and Information Science, vol 17. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88479-8_48

Download citation

DOI: https://doi.org/10.1007/978-3-540-88479-8_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88478-1
Online ISBN: 978-3-540-88479-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics