ABSTRACT
Clinical and genomics datasets contain humongous amount of information which are used in their respective environments independently to produce new science or better explain existing approaches. The interaction of data between these two domains is very limited and, hence, the information is disseminated. These disparate datasets need to be integrated to consolidate scattered pieces of information into a unified knowledge base to support new research challenges. However, there is no platform available that allows integration of clinical and genomics datasets into a consistent and coherent data source and produce analytics from it. We propose a data integration model here which will be capable of integrating clinical and genomics datasets using meta-dimensional approaches and machine learning methods. Bayesian Networks, which are based on meta-dimensional approach, will be used to design a probabilistic data model, and Neural Networks, which are based on machine learning, will be used for classification and pattern recognition from integrated data. This integration will help to coalesce the genetic background of clinical traits which will be immensely beneficial to derive new research insights for drug designing or precision medicine.
- Louie, B., Mork, P., Martin-Sanchez, F., Halevy, A. and Tarczy-Hornoch, P., 2007. Data integration and genomic medicine. Journal of biomedical informatics, 40(1), pp.5--16. Google ScholarDigital Library
- Ritchie, M.D., Holzinger, E.R., Li, R., Pendergrass, S.A. and Kim, D., 2015. Methods of integrating data to uncover genotype-phenotype interactions.Nature Reviews Genetics, 16(2), pp.85--97.Google Scholar
- Hamid, J.S., Hu, P., Roslin, N.M., Ling, V., Greenwood, C.M. and Beyene, J., 2009. Data integration in genetics and genomics: methods and challenges. Human genomics and proteomics, 1(1).Google Scholar
- Nevins, J.R., Huang, E.S., Dressman, H., Pittman, J., Huang, A.T. and West, M., 2003. Towards integrated clinico-genomic models for personalized medicine: combining gene expression signatures and clinical factors in breast cancer outcomes prediction. Human molecular genetics, 12(suppl 2), pp.R153--R157.Google Scholar
- Schadt, E.E., Lamb, J., Yang, X., Zhu, J., Edwards, S., GuhaThakurta, D., Sieberts, S.K., Monks, S., Reitman, M., Zhang, C. and Lum, P.Y., 2005. An integrative genomics approach to infer causal associations between gene expression and disease. Nature genetics, 37(7), pp.710--717.Google Scholar
- Lenzerini, M., 2002, June. Data integration: A theoretical perspective. InProceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (pp. 233--246). ACM. Google ScholarDigital Library
- Doan, A., Halevy, A. and Ives, Z., 2012. Principles of data integration. Elsevier. Google ScholarDigital Library
- https://www.techopedia.com/definition/28290/data-integrationGoogle Scholar
- Orechia, J., Pathak, A., Shi, Y., Nawani, A., Belozerov, A., Fontes, C., Lakhiani, C., Jawale, C., Patel, C., Quinn, D. and Botvinnik, D., 2015. OncDRS: An integrative clinical and genomic data platform for enabling translational research and precision medicine. Applied & translational genomics, 6, pp.18--25.Google Scholar
- https://en.wikipedia.org/wiki/Clinical_data_repositoryGoogle Scholar
- https://wiki.nci.nih.gov/display/TCGA/Clinical+Data+OverviewGoogle Scholar
- Gilchrist, J., Frize, M., Ennett, C.M. and Bariciak, E., 2011. Performance evaluation of various storage formats for clinical data repositories. IEEE Transactions on Instrumentation and Measurement, 60(10), pp.3244--3252.Google ScholarCross Ref
- https://www.genomatix.de/online_help/help/sequence_formats.htmlGoogle Scholar
- https://faculty.washington.edu/browning/beagle/intro-to-vcf.htmlGoogle Scholar
- https://www.sas.com/content/dam/SAS/en_us/doc/factsheet/sas-clinical-data-integration-103961.pdfGoogle Scholar
- http://lumeris.com/wp-content/uploads/2014/05/Lumeris-SOL.CDI_.05-14.v1.pdfGoogle Scholar
- https://www.edifecs.com/downloads/Clinical_Data_Integration_Solution_Brief_2015.pdfGoogle Scholar
- Lee, E., Cho, S., Kim, K. and Park, T., 2009. An integrated approach to infer causal associations among gene expression, genotype variation, and disease. Genomics, 94(4), pp.269--277.Google ScholarCross Ref
- Fridley, B.L., Lund, S., Jenkins, G.D. and Wang, L., 2012. A Bayesian integrative genomic model for pathway analysis of complex traits. Genetic epidemiology, 36(4), pp.352--359.Google Scholar
- Akavia, U.D., Litvin, O., Kim, J., Sanchez-Garcia, F., Kotliar, D., Causton, H.C., Pochanard, P., Mozes, E., Garraway, L.A. and Pe'er, D., 2010. An integrated approach to uncover drivers of cancer. Cell, 143(6), pp.1005--1017.Google ScholarCross Ref
- Holzinger, E.R., Dudek, S.M., Frase, A.T., Pendergrass, S.A. and Ritchie, M.D., 2013. ATHENA: the analysis tool for heritable and environmental network associations. Bioinformatics, p.btt572.Google Scholar
- Kim, D., Li, R., Dudek, S.M. and Ritchie, M.D., 2013. ATHENA: Identifying interactions between different levels of genomic data associated with cancer clinical outcomes using grammatical evolution neural network. BioData mining, 6(1), p.1.Google Scholar
- http://transmartfoundation.orgGoogle Scholar
- Athey, B.D., Braxenthaler, M., Haas, M. and Guo, Y., 2013. tranSMART: an open source and community-driven informatics and data sharing platform for clinical and translational research. AMIA Summits on Translational Science Proceedings, 2013, p.6.Google Scholar
- Ben-Gal, I., 2007. Bayesian networks. Encyclopedia of statistics in quality and reliability.Google Scholar
- Singh, S. and Graepel, T., 2012. Compiling relational database schemata into probabilistic graphical models. arXiv preprint arXiv:1212.0967.Google Scholar
- Getoor, L., 2006. An Introduction to Probabilistic Graphical Models for Relational Data. IEEE Data Eng. Bull., 29(1), pp.32--39.Google Scholar
- Wang, L., Zhang, A. and Ramanathan, M., 2005. BioStar models of clinical and genomic data for biomedical data warehouse design. International journal of bioinformatics research and applications, 1(1), pp.63--80. Google ScholarDigital Library
- Du, N., Guo, S., Mahajan, S.D., Schwartz, S.A., Nair, B.B., Hsiao, C.B. and Zhang, A., 2012. BioStar+: a data warehouse schema for integrating clinical and genomic data from HIV patients. ACM SIGBioinformatics Record, 2(3), pp.6--16. Google ScholarDigital Library
- https://www.opentargets.orgGoogle Scholar
Recommendations
Comparison of methods for meta-dimensional data analysis using in silico and biological data sets
EvoBIO'12: Proceedings of the 10th European conference on Evolutionary Computation, Machine Learning and Data Mining in BioinformaticsRecent technological innovations have catalyzed the generation of a massive amount of data at various levels of biological regulation, including DNA, RNA and protein. Due to the complex nature of biology, the underlying model may only be discovered by ...
Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM
Motivation: High-throughput data is providing a comprehensive view of the molecular changes in cancer tissues. New technologies allow for the simultaneous genome-wide assay of the state of genome copy number variation, gene expression, DNA ...
Integration of epigenetic data in Bayesian network modeling of gene regulatory network
PRIB'11: Proceedings of the 6th IAPR international conference on Pattern recognition in bioinformaticsThe reverse engineering of gene regulatory network (GRN) is an important problem in systems biology. While gene expression data provide a main source of insights, other types of data are needed to elucidate the structure and dynamics of gene regulation. ...
Comments