Skip to main content

Building an i2b2-Based Integrated Data Repository for Cancer Research: A Case Study of Ovarian Cancer Registry

  • Conference paper
  • First Online:
Book cover Data Management and Analytics for Medicine and Healthcare (DMAH 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10186))

Abstract

In this study, we describe our preliminary efforts in building an i2b2-based integrated data repository that supports centralized data management for ovarian cancer clinical research, and discuss important lessons learnt that would inspire the evaluation and enhancement for future generic cancer-specific data repository. We collected multiple types of heterogeneous clinical data, including demographic, outcome, chemo-treatment and lab-test information for ovarian cancer. To better integrate different data types, we conducted data normalization procedures through reusing standard codes and creating mappings between local codes and standard vocabularies. We also developed the extract, transform and load (ETL) scripts to load the data into an i2b2 instance. Through further analytic practices, we evaluated major expectations of the systems according to common clinical research needs, including cohort query and identification, clinical data-based hypothesis-testing, and exploratory data-mining. We also identified and discussed outstanding issues we will address through additional enhancement of existing i2b2 system.

Z. Li—Co-first author.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.r-project.org/

References

  1. Huser, V., Cimino, J.J.: Desiderata for healthcare integrated data repositories based on architectural comparison of three public repositories. In: AMIA Annual Symposium Proceedings 2013, pp. 648–656 (2013)

    Google Scholar 

  2. Wade, T.D., et al.: Using patient lists to add value to integrated data repositories. J. Biomed. Inform. 52, 72–77 (2014)

    Article  Google Scholar 

  3. MacKenzie, S.L., et al.: Practices and perspectives on building integrated data repositories: results from a 2010 CTSA survey. J. Am. Med. Inform. Assoc. 19(e1), e119–e124 (2012)

    Article  MathSciNet  Google Scholar 

  4. The observational health data sciences and informatics. http://www.ohdsi.org/. Accessed 7 Mar 2016

  5. PCORnet, the National Patient-Centered Clinical Research Network. http://www.pcornet.org/. Accessed 3 Mar 2016

  6. Informatics for integrating biology and the bedside (i2b2). https://www.i2b2.org/. Accessed 3 Mar 2016

  7. Data sharing network (SHRINE). https://www.i2b2.org/work/shrine.html. Accessed 3 Mar 2016

  8. Rustin, G.J., et al.: Defining response of ovarian carcinoma to initial chemotherapy according to serum CA 125. J. Clin. Oncol. 14(5), 1545–1551 (1996)

    Article  Google Scholar 

  9. Sun, C.C., et al.: Rankings and symptom assessments of side effects from chemotherapy: insights from experienced patients with ovarian cancer. Support. Care Cancer 13(4), 219–227 (2005)

    Article  Google Scholar 

  10. Siegel, R.L., Miller, K.D., Jemal, A.: Cancer statistics, 2015. CA Cancer J. Clin. 65(1), 5–29 (2015)

    Article  Google Scholar 

  11. Konecny, G.E., et al.: Prognostic and therapeutic relevance of molecular subtypes in high-grade serous ovarian cancer. J. Natl Cancer Inst. 106(10), dju249 (2014)

    Article  Google Scholar 

  12. Wang, C., et al.: Tumor hypomethylation at 6p21.3 associates with longer time to recurrence of high-grade serous epithelial ovarian cancer. Cancer Res. 74(11), 3084–3091 (2014)

    Article  Google Scholar 

  13. International classification of diseases (ICD). http://www.who.int/classifications/icd/en/. Accessed 3 Mar 2016

  14. National drug code directory. http://www.fda.gov/Drugs/InformationOnDrugs/ucm142438.htm. Accessed 7 Mar 2016

  15. A universal code system for tests, measurements, and observations. https://loinc.org/. Accessed 7 Mar 2016

  16. NCI Thesaurus (NCIt). https://wiki.nci.nih.gov/display/EVS/NCI+Thesaurus+(NCIt). Accessed 3 Mar 2016

  17. North American Association of Centrak Cancer Registries, Data Standards & Data Dictionary, vol. II (2015). https://www.naaccr.org/StandardsandRegistryOperations/VolumeII.aspx#. Accessed 3 Mar 2016

  18. Segagni, D., et al.: R engine cell: integrating R into the i2b2 software infrastructure. J. Am. Med. Inform. Assoc. 18(3), 314–317 (2011)

    Article  Google Scholar 

  19. rgate: gateway between i2b2 plugins and R (2013) https://informatics.kumc.edu/work/wiki/HeronStatsPlugins. Accessed 3 Mar 2013

  20. GIRI (Generic Integration of R into I2b2) (2014). http://community.i2b2.org/wiki/display/GIRI/Home. Accessed 3 Mar 2013

Download references

Acknowledgement

The study is supported in part by a NCI U01 Project – caCDE-QA (1U01CA180940-01A1), R01-CA122443, and an award from Mayo Clinic Ovarian Cancer SPORE (P50 CA136393).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guoqian Jiang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Hong, N. et al. (2017). Building an i2b2-Based Integrated Data Repository for Cancer Research: A Case Study of Ovarian Cancer Registry. In: Wang, F., Yao, L., Luo, G. (eds) Data Management and Analytics for Medicine and Healthcare. DMAH 2016. Lecture Notes in Computer Science(), vol 10186. Springer, Cham. https://doi.org/10.1007/978-3-319-57741-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-57741-8_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-57740-1

  • Online ISBN: 978-3-319-57741-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics