Skip to main content

Building a Research-Quality Copy Number Variation Data Repository for Translational Research

  • Conference paper
  • First Online:
Heterogeneous Data Management, Polystores, and Analytics for Healthcare (DMAH 2018, Poly 2018)

Abstract

Copy number variation (CNV) has known associations with population diversities and disease conditions. However, research communities face great challenges in reusing the CNV data due to the heterogeneity of existing CNV data sources. The objective of the study is to design, develop and evaluate a scalable CNV data repository based on a proposed common data schema for facilitating research-quality CNV data integration and reuse. We created a proposal for a CNV common data schema through analyzing multiple existing CNV data sources. We designed a collection of the CNV quality metrics and demonstrated its usefulness using the CNV data from a study of ovarian cancer xenograft models. We implemented a CNV data repository using a MongoDB database backend and established the CNV genomic data services that enable reusing of the curated CNV data and answering CNV-relevant research questions. The critical issues and future plan for the system enhancement and community engagement were discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Gamazon, E.R., Stranger, B.E.: The impact of human copy number variation on gene expression. Brief. Funct. Genomics 14(5), 352–357 (2015)

    Article  Google Scholar 

  2. Karlsson, J., Larsson, E.: FocalScan: scanning for altered genes in cancer based on coordinated DNA and RNA change. Nucleic Acids Res. 44(19), e150 (2016)

    Google Scholar 

  3. Zack, T.I., et al.: Pan-cancer patterns of somatic copy number alteration. Nat. Genet. 45(10), 1134–1140 (2013)

    Article  Google Scholar 

  4. Bragin, E., et al.: DECIPHER: database for the interpretation of phenotype-linked plausibly pathogenic sequence and copy-number variation. Nucleic Acids Res. 42(Database issue), D993–D1000 (2014)

    Article  Google Scholar 

  5. Zarrei, M., et al.: A copy number variation map of the human genome. Nat. Rev. Genet. 16(3), 172–183 (2015)

    Article  Google Scholar 

  6. Wain, L.V., Armour, J.A., Tobin, M.D.: Genomic copy number variation, human health, and disease. Lancet 374(9686), 340–350 (2009)

    Article  Google Scholar 

  7. McCarroll, S.A., et al.: Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat. Genet. 40(10), 1166–1174 (2008)

    Article  Google Scholar 

  8. Wang, C., et al.: PatternCNV: a versatile tool for detecting copy number changes from exome sequencing data. Bioinformatics 30(18), 2678–2680 (2014)

    Article  Google Scholar 

  9. Wang, W., et al.: Target-enrichment sequencing and copy number evaluation in inherited polyneuropathy. Neurology 86(19), 1762–1771 (2016)

    Article  Google Scholar 

  10. Zhao, M., et al.: Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives. BMC Bioinf. 14(Suppl 11), S1 (2013)

    Article  Google Scholar 

  11. MacDonald, J.R., et al.: The database of genomic variants: a curated collection of structural variation in the human genome. Nucleic Acids Res. 42(Database issue), D986–D992 (2014)

    Article  Google Scholar 

  12. Karczewski, K.J., et al.: The ExAC browser: displaying reference data information from over 60 000 exomes. Nucleic Acids Res. 45(D1), D840–D845 (2017)

    Article  Google Scholar 

  13. Wilkinson, M.D., et al.: The FAIR guiding principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016)

    Article  Google Scholar 

  14. Diskin, S.J., et al.: Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms. Nucleic Acids Res. 36(19), e126 (2008)

    Article  Google Scholar 

  15. Staaf, J., et al.: Normalization of illumina infinium whole-genome SNP data improves copy number estimates and allelic intensity ratios. BMC Bioinf. 9, 409 (2008)

    Article  Google Scholar 

  16. Ginsbach, P., et al.: Copy number studies in noisy samples. Microarrays 2(4), 284–303 (2013)

    Article  Google Scholar 

  17. Cooper, N.J., et al.: Detection and correction of artefacts in estimation of rare copy number variants and analysis of rare deletions in type 1 diabetes. Hum. Mol. Genet. 24(6), 1774–1790 (2015)

    Article  Google Scholar 

  18. Barretina, J., et al.: The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483(7391), 603–607 (2012)

    Article  Google Scholar 

  19. Gao, H., et al.: High-throughput screening using patient-derived tumor xenografts to predict clinical trial drug response. Nat. Med. 21(11), 1318–1325 (2015)

    Article  Google Scholar 

  20. AlHilli, M.M., et al.: In vivo anti-tumor activity of the PARP inhibitor niraparib in homologous recombination deficient and proficient ovarian carcinoma. Gynecol. Oncol. 143(2), 379–388 (2016)

    Article  Google Scholar 

  21. Weroha, S.J., et al.: Tumorgrafts as in vivo surrogates for women with ovarian cancer. Clin. Cancer Res. 20(5), 1288–1297 (2014)

    Article  Google Scholar 

  22. Glaser, G., et al.: Conventional chemotherapy and oncogenic pathway targeting in ovarian carcinosarcoma using a patient-derived tumorgraft. PLoS ONE 10(5), e0126867 (2015)

    Article  Google Scholar 

  23. Etemadmoghadam, D., et al.: Integrated genome-wide DNA copy number and expression analysis identifies distinct mechanisms of primary chemoresistance in ovarian carcinomas. Clin. Cancer Res. 15(4), 1417–1427 (2009). An official journal of the American Association for Cancer Research

    Article  Google Scholar 

  24. Olshen, A.B., et al.: Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5(4), 557–572 (2004)

    Article  MATH  Google Scholar 

  25. Mermel, C.H., et al.: GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 12(4), R41 (2011)

    Article  Google Scholar 

  26. Butler, K., et al.: Ovarian cancer tumorgraft: viral latency propagates lymphoma. Gynecol. Oncol. 127(1), S16 (2012)

    Article  Google Scholar 

  27. Qiu, F., et al.: CNVD: text mining-based copy number variation in disease database. Hum. Mutat. 33(11), E2375–E2381 (2012)

    Article  Google Scholar 

  28. Zhao, M., Zhao, Z.: CNVannotator: a comprehensive annotation server for copy number variation in the human genome. PLoS ONE 8(11), e80170 (2013)

    Article  Google Scholar 

  29. Pollex, R.L., Hegele, R.A.: Copy number variation in the human genome and its implications for cardiovascular disease. Circulation 115(24), 3130–3138 (2007)

    Article  Google Scholar 

  30. Shia, W.C., et al.: Genetic copy number variants in myocardial infarction patients with hyperlipidemia. BMC Genom. 12(Suppl 3), S23 (2011)

    Article  Google Scholar 

  31. Marques, F.Z., et al.: Measurement of absolute copy number variation reveals association with essential hypertension. BMC Med. Genomics 7, 44 (2014)

    Article  Google Scholar 

  32. Wang, K., et al.: Large copy-number variations are enriched in cases with moderate to extreme obesity. Diabetes 59(10), 2690–2694 (2010)

    Article  Google Scholar 

  33. Prabhanjan, M., et al.: Type 2 diabetes mellitus disease risk genes identified by genome wide copy number variation scan in normal populations. Diabetes Res. Clin. Pract. 113, 160–170 (2016)

    Article  Google Scholar 

  34. Patch, A.M., et al.: Whole-genome characterization of chemoresistant ovarian cancer. Nature 521(7553), 489–494 (2015)

    Article  Google Scholar 

Download references

Acknowledgements

The study is supported in part by a NIH BD2KOnFHIR U01 project (U01 HG009450), a NCI U01 Project – caCDE-QA (U01 CA180940), the Mayo Clinic Specialized Program in Research Excellence (SPORE) grant P50 CA136393, R01 CA184502 from the National Institutes of Health, Minnesota Ovarian Cancer Alliance, and Ovarian Cancer Research Fund Alliance.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chen Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, C., Moore, R.M., Evans, J.M., Hou, X., John Weroha, S., Jiang, G. (2019). Building a Research-Quality Copy Number Variation Data Repository for Translational Research. In: Gadepally, V., Mattson, T., Stonebraker, M., Wang, F., Luo, G., Teodoro, G. (eds) Heterogeneous Data Management, Polystores, and Analytics for Healthcare. DMAH Poly 2018 2018. Lecture Notes in Computer Science(), vol 11470. Springer, Cham. https://doi.org/10.1007/978-3-030-14177-6_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-14177-6_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-14176-9

  • Online ISBN: 978-3-030-14177-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics