Abstract
Medical data is organically heterogeneous, and it usually varies significantly in both size and composition. Yet, this data is also a key for the recent and promising field of precision medicine, which focuses on identifying and tailoring appropriate medical treatments for the needs of the individual patients, based on their specific conditions, their medical history, lifestyle, genetic, and other individual factors. As we, and a database community at large, recognize that a “one size does not fit all” solution is required to work with such data, we present our observations based on our experiences, and the applications in the field of precision medicine. We make the case for the use of polystore architecture; how it applies for precision medicine; we discuss the reference architecture; describe some of its critical components (array database); and discuss the specific types of analysis that directly benefit from this database architecture, and the ways it serves the data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
GenomicsDB. https://github.com/Intel-HLS/GenomicsDB
Intel-Broad Collaboration. http://genomicinfo.broadinstitute.org/acton/media/13431/broad-intel-collaboration
PostgreSQL. http://www.postgresql.org
Unboxing GATK4. https://gatkforums.broadinstitute.org/gatk/discussion/9644/unboxing-gatk4
Baumann, P., Dehmel, A., Furtado, P., Ritsch, R., Widmann, N.: The multidimensional database system RasDaMan. In: SIGMOD (1998)
Benneyan, J.C., Lloyd, R.C., Plsek, P.E.: Statistical process control as a tool for research and healthcare improvement. Qual. Saf. Health Care 12(6), 458–464 (2003)
Brown, P.G.: Overview of SciDB: large scale array storage, processing and analysis. In: SIGMOD (2010)
Carey, M.J., Haas, L.M., Schwarz, P.M., Arya, M., Cody, W.E., Fagin, R., Flickner, M., Luniewski, A.W., Niblack, W., Petkovic, D., et al.: Towards heterogeneous multimedia information systems: the Garlic approach. In: Proceedings of the Fifth International Workshop on Research Issues in Data Engineering, 1995: Distributed Object Management. RIDE-DOM 1995, pp. 124–131. IEEE (1995)
Chen, P., Gadepally, V., Stonebraker, M.: The bigdawg monitoring framework. In: High Performance Extreme Computing Conference (HPEC), 2016 IEEE, pp. 1–6. IEEE (2016)
Dasgupta, S., Coakley, K., Gupta, A.: Analytics-driven data ingestion and derivation in the AWESOME polystore. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 2555–2564. IEEE (2016)
DeWitt, D., Gray, J.: Parallel database systems: the future of high performance database systems. Commun. ACM 35(6), 85–98 (1992)
Dziedzic, A., Elmore, A.J., Stonebraker, M.: Data transformation and migration in polystores. In: 2016 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6. IEEE (2016)
Elmore, A., Duggan, J., Stonebraker, M., Balazinska, M., Cetintemel, U., Gadepally, V., Heer, J., Howe, B., Kepner, J., Kraska, T., et al.: A demonstration of the BigDAWG polystore system. Proc. VLDB Endow. 8(12), 1908–1911 (2015)
Gadepally, V., Chen, P., Duggan, J., Elmore, A., Haynes, B., Kepner, J., Madden, S., Mattson, T., Stonebraker, M.: The BigDAWG polystore system and architecture. In: 2016 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6. IEEE (2016)
Gadepally, V., OBrien, K., Dziedzic, A., Elmore, A., Kepner, J., Madden, S., Mattson, T., Rogers, J., She, Z., Stonebraker, M.: Version 0.1 of the BigDAWG Polystore System. arXiv preprint arXiv:1707.00721 (2017)
Gassner, P., Lohman, G.M., Schiefer, K.B., Wang, Y.: Query optimization in the IBM DB2 family. IEEE Data Eng. Bull. 16(4), 4–18 (1993)
Gupta, A.M., Gadepally, V., Stonebraker, M.: Cross-engine query execution in federated database systems. In: 2016 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6. IEEE (2016)
Hudak, D.E., Ludban, N., Krishnamurthy, A., Gadepally, V., Samsi, S., Nehrbass, J.: A computational science IDE for HPC systems: design and applications. Int. J. Parallel Prog. 37(1), 91–105 (2009)
Kolev, B., Bondiombouy, C., Valduriez, P., Jiménez-Peris, R., Pau, R., Pereira, J.: The cloudmdsql multistore system. In: Proceedings of the 2016 International Conference on Management of Data, pp. 2113–2116. ACM (2016)
Krishnamurthy, A., Samsi, S., Gadepally, V.: Parallel MATALAB techniques. In: Image Processing. InTech (2009)
Lamb, A., Fuller, M., Varadarajan, R., Tran, N., Vandiver, B., Doshi, L., Bear, C.: The vertica analytic database: C-store 7 years later. Proc. VLDB Endow. 5(12), 1790–1801 (2012)
Mattson, T., Gadepally, V., She, Z., Dziedzic, A., Parkhurst, J.: Demonstrating the BigDAWG polystore system for ocean metagenomics analysis. In: CIDR (2017)
Mirnezami, R., Nicholson, J., Darzi, A.: Preparing for precision medicine. N. Engl. J. Med. 366(6), 489–491 (2012)
Ng, K., Ghoting, A., Steinhubl, S.R., Stewart, W.F., Malin, B., Sun, J.: PARAMO: a PARAllel predictive MOdeling platform for healthcare analytic research using electronic health records. J. Biomed. Inform. 48, 160–170 (2014)
Palmer, C.R.: Ethics, data-dependent designs, and the strategy of clinical trials: time to start learning-as-we-go? Stat. Methods Med. Res. 11(5), 381–402 (2002)
Papadopoulos, S., Datta, K., Madden, S., Mattson, T.: The tiledb array data storage manager. Proc. VLDB Endow. 10(4), 349–360 (2016)
Roland, M., Torgerson, D.J.: Understanding controlled trials: what are pragmatic trials? BMJ: Br. Med. J. 316(7127), 285 (1998)
Saeed, M., Villarroel, M., Reisner, A.T., Clifford, G., Lehman, L.-W., Moody, G., Heldt, T., Kyaw, T.H., Moody, B., Mark, R.G.: Multiparameter intelligent monitoring in intensive care II (MIMIC-II): a public-access intensive care unit database. Crit. Care Med. 39(5), 952 (2011)
Safran, C., Bloomrosen, M., Hammond, W.E., Labkoff, S., Markel-Fox, S., Tang, P.C., Detmer, D.E.: Toward a national framework for the secondary use of health data: an American Medical Informatics Association White Paper. J. Am. Med. Inform. Assoc. 14(1), 1–9 (2007)
She, Z., Ravishankar, S., Duggan, J.: Bigdawg polystore query optimization through semantic equivalences. In: 2016 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6. IEEE (2016)
Sheth, A.P., Larson, J.A.: Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Comput. Surv. (CSUR) 22(3), 183–236 (1990)
Stonebraker, M., Cetintemel, U.: “one size fits all”: an idea whose time has come and gone. In: Proceedings of the 21st International Conference on Data Engineering. ICDE 2005, pp. 2–11. IEEE (2005)
Wang, J., Baker, T., Balazinska, M., Halperin, D., Haynes, B., Howe, B., Hutchison, D., Jain, S., Maas, R., Mehta, P., et al.: The myria big data management and analytics system and cloud services. In: CIDR (2017)
Yong, K.K., Karuppiah, E.K., See, S.C.-W.: Galactica: a GPU parallelized database accelerator. In: Proceedings of the 2014 International Conference on Big Data Science and Computing, p. 10. ACM (2014)
Zhou, X., Liu, S., Kim, E.S., Herbst, R.S., Lee, J.J.: Bayesian adaptive design for targeted therapy development in lung cancera step toward personalized medicine. Clin. Trials 5(3), 181–193 (2008)
Acknowledgments
This manuscript has been in part authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the U.S. Department of Energy, and under a joint program (MVP CHAMPION), between the U.S. Department of Energy, and the U.S. Department of Veterans Affairs.
The authors would like to thank the Intel Science and Technology Center (ISTC) for Big Data and the BigDAWG contributors (https://bigdawg.mit.edu/contributors) for their role in developing the BigDAWG system.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Begoli, E., Christian, J.B., Gadepally, V., Papadopoulos, S. (2017). An Emerging Role for Polystores in Precision Medicine. In: Begoli, E., Wang, F., Luo, G. (eds) Data Management and Analytics for Medicine and Healthcare. DMAH 2017. Lecture Notes in Computer Science(), vol 10494. Springer, Cham. https://doi.org/10.1007/978-3-319-67186-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-67186-4_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67185-7
Online ISBN: 978-3-319-67186-4
eBook Packages: Computer ScienceComputer Science (R0)