Abstract
In this work, we introduce two novel contributions to the study of comorbidity. The first is a new method for finding disease correlations, using a multitude of information sources. In the era of big data, methods such as evidence synthesis enable researchers to exploit many freely available information sources to enrich their analyses. This forms the basis for our method where in lieu of examining one form of evidence, we introduce a novel combination of sources, providing an indirect association between patient genetic data and the scientific literature. Our second contribution is a new method for stratifying the scientific literature when searching for newly discovered disease correlations. Given that the volume of published biomedical literature has increased dramatically, a clinician does not have the ability to read every relevant article. We therefore propose a new way for refining the literature search space to discover recently introduced disease correlations. Results show that our system can produce reasonable hypotheses for disease correlations, and that document stratification is an important aspect to take into account when using scientific literature.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Valderas, J.M., Starfield, B., Sibbald, B., Salisbury, C., Roland, M.: Defining comorbidity: implications for understanding health and health services. Ann. Family Med. 7(4), 357–363 (2009)
Ware, H., Mullett, C.J., Jagannathan, V.: Natural language processing framework to assess clinical conditions. J. Am. Med. Inform. Assoc. 16(4), 585–589 (2009)
Salmasian, H., Freedberg, D.E., Friedman, C.: Deriving comorbidities from medical records using natural language processing. J. Am. Med. Inform. Assoc. 20, e239 (2013). amiajnl-2013
Sutton, A.J., Welton, N.J., Cooper, N., Abrams, K.R., Ades, A.E.: Evidence Synthesis for Decision Making in Healthcare, vol. 132. Wiley, Hoboken (2012)
Smyth, G.K.: Limma: linear models for microarray data. In: Gentleman, R., Carey, V.J., Huber, W., Irizarry, R.A., Dudoit, S. (eds.) Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Statistics for Biology and Health, pp. 397–420. Springer, New York (2005). doi:10.1007/0-387-29362-0_23
Doms, A., Schroeder, M.: Gopubmed: exploring pubmed with the gene ontology. Nucleic Acids Res. 33(suppl 2), W783–W786 (2005)
Lipscomb, C.E.: Medical subject headings (mesh). Bull. Med. Libr. Assoc. 88(3), 265 (2000)
Aronson, A.R.: Effective mapping of biomedical text to the UMLS metathesaurus: the MetaMap program. In: Proceedings of the AMIA Symposium, p. 17. American Medical Informatics Association (2001)
Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32(suppl 1), D267–D270 (2004)
Hidalgo, C.A., Blumm, N., Barabási, A.L., Christakis, N.A.: A dynamic network approach for the study of human phenotypes. PLoS Comput. Biol. 5(4), e1000353 (2009)
Acknowledgements
This work has been supported by the EPSRC. We thank the reviewers for their helpful comments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Heffernan, K., Liò, P., Teufel, S. (2017). Multilayer Data and Document Stratification for Comorbidity Analysis. In: Bracciali, A., Caravagna, G., Gilbert, D., Tagliaferri, R. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2016. Lecture Notes in Computer Science(), vol 10477. Springer, Cham. https://doi.org/10.1007/978-3-319-67834-4_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-67834-4_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67833-7
Online ISBN: 978-3-319-67834-4
eBook Packages: Computer ScienceComputer Science (R0)