Abstract
Copy number variation (CNV) has known associations with population diversities and disease conditions. However, research communities face great challenges in reusing the CNV data due to the heterogeneity of existing CNV data sources. The objective of the study is to design, develop and evaluate a scalable CNV data repository based on a proposed common data schema for facilitating research-quality CNV data integration and reuse. We created a proposal for a CNV common data schema through analyzing multiple existing CNV data sources. We designed a collection of the CNV quality metrics and demonstrated its usefulness using the CNV data from a study of ovarian cancer xenograft models. We implemented a CNV data repository using a MongoDB database backend and established the CNV genomic data services that enable reusing of the curated CNV data and answering CNV-relevant research questions. The critical issues and future plan for the system enhancement and community engagement were discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Gamazon, E.R., Stranger, B.E.: The impact of human copy number variation on gene expression. Brief. Funct. Genomics 14(5), 352–357 (2015)
Karlsson, J., Larsson, E.: FocalScan: scanning for altered genes in cancer based on coordinated DNA and RNA change. Nucleic Acids Res. 44(19), e150 (2016)
Zack, T.I., et al.: Pan-cancer patterns of somatic copy number alteration. Nat. Genet. 45(10), 1134–1140 (2013)
Bragin, E., et al.: DECIPHER: database for the interpretation of phenotype-linked plausibly pathogenic sequence and copy-number variation. Nucleic Acids Res. 42(Database issue), D993–D1000 (2014)
Zarrei, M., et al.: A copy number variation map of the human genome. Nat. Rev. Genet. 16(3), 172–183 (2015)
Wain, L.V., Armour, J.A., Tobin, M.D.: Genomic copy number variation, human health, and disease. Lancet 374(9686), 340–350 (2009)
McCarroll, S.A., et al.: Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat. Genet. 40(10), 1166–1174 (2008)
Wang, C., et al.: PatternCNV: a versatile tool for detecting copy number changes from exome sequencing data. Bioinformatics 30(18), 2678–2680 (2014)
Wang, W., et al.: Target-enrichment sequencing and copy number evaluation in inherited polyneuropathy. Neurology 86(19), 1762–1771 (2016)
Zhao, M., et al.: Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives. BMC Bioinf. 14(Suppl 11), S1 (2013)
MacDonald, J.R., et al.: The database of genomic variants: a curated collection of structural variation in the human genome. Nucleic Acids Res. 42(Database issue), D986–D992 (2014)
Karczewski, K.J., et al.: The ExAC browser: displaying reference data information from over 60 000 exomes. Nucleic Acids Res. 45(D1), D840–D845 (2017)
Wilkinson, M.D., et al.: The FAIR guiding principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016)
Diskin, S.J., et al.: Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms. Nucleic Acids Res. 36(19), e126 (2008)
Staaf, J., et al.: Normalization of illumina infinium whole-genome SNP data improves copy number estimates and allelic intensity ratios. BMC Bioinf. 9, 409 (2008)
Ginsbach, P., et al.: Copy number studies in noisy samples. Microarrays 2(4), 284–303 (2013)
Cooper, N.J., et al.: Detection and correction of artefacts in estimation of rare copy number variants and analysis of rare deletions in type 1 diabetes. Hum. Mol. Genet. 24(6), 1774–1790 (2015)
Barretina, J., et al.: The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483(7391), 603–607 (2012)
Gao, H., et al.: High-throughput screening using patient-derived tumor xenografts to predict clinical trial drug response. Nat. Med. 21(11), 1318–1325 (2015)
AlHilli, M.M., et al.: In vivo anti-tumor activity of the PARP inhibitor niraparib in homologous recombination deficient and proficient ovarian carcinoma. Gynecol. Oncol. 143(2), 379–388 (2016)
Weroha, S.J., et al.: Tumorgrafts as in vivo surrogates for women with ovarian cancer. Clin. Cancer Res. 20(5), 1288–1297 (2014)
Glaser, G., et al.: Conventional chemotherapy and oncogenic pathway targeting in ovarian carcinosarcoma using a patient-derived tumorgraft. PLoS ONE 10(5), e0126867 (2015)
Etemadmoghadam, D., et al.: Integrated genome-wide DNA copy number and expression analysis identifies distinct mechanisms of primary chemoresistance in ovarian carcinomas. Clin. Cancer Res. 15(4), 1417–1427 (2009). An official journal of the American Association for Cancer Research
Olshen, A.B., et al.: Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5(4), 557–572 (2004)
Mermel, C.H., et al.: GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 12(4), R41 (2011)
Butler, K., et al.: Ovarian cancer tumorgraft: viral latency propagates lymphoma. Gynecol. Oncol. 127(1), S16 (2012)
Qiu, F., et al.: CNVD: text mining-based copy number variation in disease database. Hum. Mutat. 33(11), E2375–E2381 (2012)
Zhao, M., Zhao, Z.: CNVannotator: a comprehensive annotation server for copy number variation in the human genome. PLoS ONE 8(11), e80170 (2013)
Pollex, R.L., Hegele, R.A.: Copy number variation in the human genome and its implications for cardiovascular disease. Circulation 115(24), 3130–3138 (2007)
Shia, W.C., et al.: Genetic copy number variants in myocardial infarction patients with hyperlipidemia. BMC Genom. 12(Suppl 3), S23 (2011)
Marques, F.Z., et al.: Measurement of absolute copy number variation reveals association with essential hypertension. BMC Med. Genomics 7, 44 (2014)
Wang, K., et al.: Large copy-number variations are enriched in cases with moderate to extreme obesity. Diabetes 59(10), 2690–2694 (2010)
Prabhanjan, M., et al.: Type 2 diabetes mellitus disease risk genes identified by genome wide copy number variation scan in normal populations. Diabetes Res. Clin. Pract. 113, 160–170 (2016)
Patch, A.M., et al.: Whole-genome characterization of chemoresistant ovarian cancer. Nature 521(7553), 489–494 (2015)
Acknowledgements
The study is supported in part by a NIH BD2KOnFHIR U01 project (U01 HG009450), a NCI U01 Project – caCDE-QA (U01 CA180940), the Mayo Clinic Specialized Program in Research Excellence (SPORE) grant P50 CA136393, R01 CA184502 from the National Institutes of Health, Minnesota Ovarian Cancer Alliance, and Ovarian Cancer Research Fund Alliance.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, C., Moore, R.M., Evans, J.M., Hou, X., John Weroha, S., Jiang, G. (2019). Building a Research-Quality Copy Number Variation Data Repository for Translational Research. In: Gadepally, V., Mattson, T., Stonebraker, M., Wang, F., Luo, G., Teodoro, G. (eds) Heterogeneous Data Management, Polystores, and Analytics for Healthcare. DMAH Poly 2018 2018. Lecture Notes in Computer Science(), vol 11470. Springer, Cham. https://doi.org/10.1007/978-3-030-14177-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-14177-6_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-14176-9
Online ISBN: 978-3-030-14177-6
eBook Packages: Computer ScienceComputer Science (R0)