ABSTRACT
Access to clinical data is critical for advancing translational research; but regulatory constraints and policies surrounding the use of clinical data often challenge data access and sharing. Mixed medical datasets (structured and unstructured) are increasingly dominating the clinical information space, hence, demanding AI-driven techniques such as Natural Language Processing-to reorganize them for effective usage. This paper excavates the HMDB (Human Metabolome Database), for efficient knowledge mining, supported by diversely certified oncology physicians and pharmacists' contributions. We propose a novel taxonomy for knowledge representation and establish a universe of discourse for disease clustering and prediction. Excavated data include metabolites and their respective concentration values, age, gender, as well as gene and protein sequences, of normal and abnormal patients. These data were then merged to form an AI-ready 'Omic' technology datasets. Preliminary results reveal that the proposed AI-ready datasets would aid precision oncology research by adding quality analysis to the present HMDB, and for explaining the variations in concentration values of cancer patients.
- Kaddurah-Daouk, R., Kristal, B. S., Weinshilboum, R.M., 2008. Metabolomics: A Global Biochemical Approach to Drug Response and Disease. Annu. Rev. Pharmacol. Toxicol., 48, 653--683.Google ScholarCross Ref
- Lee, M. Y. and Hu, T., 2019. Computational methods for the discovery of metabolic markers of complex traits. Metabolites, 9(4), 66, 1--18.Google Scholar
- Trezzi, J. P., Vlassis, N. and Hiller, K., 2015. The role of metabolomics in the study of cancer biomarkers and in the development of diagnostic tools. In Advances in Cancer Biomarkers (pp. 41--57). Springer, Dordrecht.Google Scholar
- Ahalt, S. C., Chute, C. G., Fecho, K., Glusman, G., Hadlock, J., Taylor, C. O., Pfaff, E. R., Robinson, P. N., Solbrig, H., Ta, C. and Tatonetti, N., 2019. Clinical data: sources and types, regulatory constraints, applications. Clinical and translational science, 12(4): 329--333.Google Scholar
- Wishart, D. S., Feunang, Y. D., Marcu, A., Guo, A. C., Liang, K., Vázquez-Fresno, R., Sajed, T., Johnson, D., Li, C., Karu, N. and Sayeeda, Z., 2018. HMDB 4.0: the human metabolome database for 2018. Nucleic acids research, 46(D1), D608-D617.Google Scholar
- Shanteau, J., 2015. Why task domains (still) matter for understanding expertise. Journal of Applied Research in Memory and Cognition, 4(3), 169--175.Google ScholarCross Ref
- Tamborero, D., Gonzalez-Perez, A. and Lopez-Bigas, N., 2013. Oncodriveclust: exploiting the positional clustering of somatic mutations to identify cancer genes. Bioinformatics 29, 2238--2244.Google ScholarCross Ref
- Lawrence, M. S., Stojanov, P., Mermel, C. H., Robinson, J. T., Garraway, L. A., Golub, T. R., Meyerson, M., Gabriel, S. B., Lander, E. S. and Getz, G., 2014. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature, 505(7484): 495--501.Google ScholarCross Ref
- Hou, J. P. and Ma, J., 2014. DawnRank: discovering personalized driver genes in cancer. Genome medicine, 6(56): 1--16.Google Scholar
- Guo, W. F., Zhang, S. W., Liu, L. L., Liu, F., Shi, Q. Q., Zhang, L., Tang, Y., Zeng, T. and Chen, L., 2018. Discovering personalized driver mutation profiles of single samples in cancer by network control strategy. Bioinformatics, 34(11): 1893--1903.Google Scholar
- Wong, W. C., Kim, D., Carter, H., Diekhans, M., Ryan, M. C. and Karchin, R., 2011. CHASM and SNVBox: toolkit for detecting biologically important single nucleotide mutations in cancer. Bioinformatics, 27(15), 2147--2148.Google ScholarDigital Library
- Kumar, P., Henikoff, S. and Ng, P. C., 2009. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nature protocols, 4(7), 1073--1082.Google Scholar
- Reimand, J. and Bader, G. D., 2013. Systematic analysis of somatic mutations in phosphorylation signaling predicts novel cancer drivers. Mol. Syst. Biol., 9(637): 1--18.Google Scholar
- Gonzalez-Perez, A., and Lopez-Bigas, N., 2012. Functional impact bias reveals cancer drivers. Nucleic Acids Res. 40, e169-e169.Google ScholarCross Ref
- Dees, N. D., Zhang, Q., Kandoth, C., Wendl, M. C., Schierding, W., Koboldt, D. C., Mooney, T. B., Callaway, M. B., Dooling, D., Mardis, E. R. and Wilson, R. K., 2012. Music: identifying mutational significance in cancer genomes. Genome Res. 22, 1589--1598.Google ScholarCross Ref
- Mularoni, L., Sabarinathan, R., Deu-Pons, J., Gonzalez-Perez, A. and López-Bigas, N., 2016. OncodriveFML: a general framework to identify coding and non-coding regions with cancer driver mutations. Genome Biol., 17(128): 1--13.Google Scholar
- Davoli, T., Xu, A. W., Mengwasser, K. E., Sack, L. M., Yoon, J. C., Park, P. J. and Elledge, S. J., 2013. Cumulative haploinsufficiency and triplosensitivity drive aneuploidy patterns and shape the cancer genome. Cell, 155(4), 948--962.Google ScholarCross Ref
- Luo, P., Ding, Y., Lei, X. and Wu, F. X., 2019. deepDriver: predicting cancer driver genes based on somatic mutations using deep convolutional neural networks. Frontiers in genetics, 10, 13: 1--12.Google Scholar
- Tokheim, C. J., Papadopoulos, N., Kinzler, K. W., Vogelstein, B. and Karchin, R., 2016. Evaluating the evaluation of cancer driver genes. Proceedings of the National Academy of Sciences, 113(50), 14330--14335.Google ScholarCross Ref
Index Terms
- Mining the Human Metabolome for Precision Oncology Research
Recommendations
A critical review of machine-learning for “multi-omics” marine metabolite datasets
AbstractDuring the last decade, genomic, transcriptomic, proteomic, metabolomic, and other omics datasets have been generated for a wide range of marine organisms, and even more are still on the way. Marine organisms possess unique and diverse ...
Highlights- Recent progress in the use and integration of “multi-omics” techniques to identify novel marine metabolites.
- The multi-omics data integration tools developed for analyzing “multi-omics” data.
- The requirement of ML for analyzing “...
Pathway Enrichment Analysis for Untargeted Metabolomics
ACM-BCB '17: Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health InformaticsMetabolomics-based studies have provided critical insights across many applications and now offer researchers an opportunity to collect information about thousands of small molecules in-bulk through untargeted metabolomics. However, taking advantage of ...
Identification, analysis, and interpretation of a human serum metabolomics causal network in an observational study
Display Omitted Identification of a Human Serum Metabolomics Causal Network in an Observational Study.Introducing individual metabolite properties as well as modules.Identification of hypothesized metabolite targets for intervention and prediction.Two ...
Comments