Skip to main content

What Can the Big Data Eco-System and Data Analytics Do for E-Health? A Smooth Review Study

  • Conference paper
  • First Online:
Bioinformatics and Biomedical Engineering (IWBBIO 2017)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10208))

Included in the following conference series:

  • 1957 Accesses

Abstract

In this paper we present a global overview of the present usage and future trends of the different big data ecosystems in the E-Health’s scientific domains. Indeed, bioinformaticians as well as medicine practitioners are actually generating very large amounts of data, and thus storing, managing, and analyzing these large scale data-sets still represent a big challenge. The used Big Data ecosystems are involved at different steps of the production chain, i.e., from the acquisition of both structured and non-structured data, the storage in traditional and/or NoSQL databases, and finally the analytics using the Map Reduce framework. We will discuss in this smooth survey, all these parts of the ecosystem and will give some use cases on real data-sets in the domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    hadoop.apache.org.

  2. 2.

    insidebigdata.com.

  3. 3.

    https://sourceforge.net/projects/contrail-bio/.

  4. 4.

    http://hipi.cs.virginia.edu.

  5. 5.

    insidebigdata.com.

  6. 6.

    http://www.mckinsey.com/.

  7. 7.

    http://managedhealthcareexecutive.modernmedicine.com.

References

  1. Inmon, W.H., Linstedt, D.: A brief history of big data. In: Inmon, W.H., Linstedt, D. (eds.) Data Architecture: A Primer for the Data Scientist, pp. 45–48. Morgan Kaufmann, Boston (2015)

    Chapter  Google Scholar 

  2. Secchi, P., Paganoni, A.M.: Advances in Complex Data Modeling. Springer, Heidelberg (2014)

    Google Scholar 

  3. Fawcett, T., Provost, F.: Data Science for Business What You Need to Know about Data Mining and Data-Analytic Thinking. OReilly Media, Sebastopol (2013)

    Google Scholar 

  4. Zou, Q., Li, X.-B., Jiang, W.-R., Lin, Z.-Y., Li, G.-L., Chen, K.: Survey of mapreduce frame operation in bioinformatics. Brief. Bioinf. 15(4), 637–647 (2014)

    Article  Google Scholar 

  5. Linstedt, D., Inmon, W.H.: Data Architecture: A Primer for the Data Scientist, Big Data, Data Warehouse and Data Vault. OReilly Media, Sebastopol (2014)

    Google Scholar 

  6. Dimitrov, D.V.: Medical internet of things and big data in healthcare. Healthc. Inf. Res. 22(3), 156–163 (2016)

    Article  Google Scholar 

  7. Coulouris, G., Dollimore, J., Kindberg, T., Blair, G.: Distributed Systems: Concepts and Design, 5th edn. Addison-Wesley Publishing Company, Boston (2011)

    MATH  Google Scholar 

  8. Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, New York (2011)

    Book  Google Scholar 

  9. Berman, F., Fox, G., Hey, A.J.G.: Grid Computing: Making the Global Infrastructure a Reality. Wiley, New York (2003)

    Book  Google Scholar 

  10. Mohammed, E.A., Far, B.H., Naugler, C.: Applications of the mapreduce programming framework to clinical big data analysis: current landscape and future trends. BioData Min. 7(1), 22 (2014)

    Article  Google Scholar 

  11. Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, SOSP 2003, pp. 29–43. ACM, New York (2003)

    Google Scholar 

  12. White, T.: Hadoop: The Definitive Guide, 1st edn. O’Reilly Media Inc., Sebastopol (2009)

    Google Scholar 

  13. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. 26(2) (2008)

    Google Scholar 

  14. Lam, C.: Hadoop in Action, 1st edn. Manning Publications Co., Greenwich (2010)

    Google Scholar 

  15. Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  16. Dean, J., Ghemawat, S.: Mapreduce: a flexible data processing tool. Commun. ACM 53(1), 72–77 (2010)

    Article  Google Scholar 

  17. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud 2010, Berkeley, CA, USA, p. 10. USENIX Association (2010)

    Google Scholar 

  18. Larus, J.R.: The cloud will change everything. SIGPLAN Not. 46(3), 1–2 (2011)

    Article  Google Scholar 

  19. Juan, H.F., Huang, H.C.: Bioinformatics. Humana Press, Totowa (2007). pp. 405–416

    Book  Google Scholar 

  20. Hoogendoorn, M., Szolovits, P., Moons, L.M.G., Numans, M.E.: Utilizing uncoded consultation notes from electronic medical records for predictive modeling of colorectal cancer. Artif. Intell. Med. 69, 53–61 (2016)

    Article  Google Scholar 

  21. Siuly, S., Li, Y., Zhang, Y.: EEG Signal Analysis and Classification - Techniques and Applications. Health Information Science. Springer, Heidelberg (2016)

    Book  Google Scholar 

  22. Kafkas, S., Kim, J.H., Pi, X., McEntyre, J.R.: Database citation in supplementary data linked to europe pubmed central full text biomedical articles. J. Biomed. Semant. 6, 1 (2015)

    Article  Google Scholar 

  23. Benabderrahmane, S., Smaïl-Tabbone, M., Poch, O., Napoli, A., Devignes, M.-D.: IntelliGO: a new vector-based semantic similarity measure including annotation origin. BMC Bioinform. 11, 588 (2010)

    Article  Google Scholar 

  24. Yu, N., Li, B., Pan, Y.: A cloud-assisted application over apache spark for investigating epigenetic markers on DNA genome sequences. In: 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom), BDCloud-SocialCom-SustainCom 2016, Atlanta, GA, USA, 8–10 October 2016, pp. 67–74 (2016)

    Google Scholar 

  25. Ahmed, Z., Saman, Z., Dandekar, T.: Mining biomedical images towards valuable information retrieval in biomedical and life sciences. Database 2016 (2016)

    Google Scholar 

  26. Fiore, S., DAnca, A., Palazzo, C., Foster, I., Williams, D.N., Aloisio, G.: Ophidia: towardbig data analytics for escience. Procedia Comput. Sci. 18, 2376–2385 (2013)

    Article  Google Scholar 

  27. Schumacher, A., Pireddu, L., Niemenmaa, M., Kallio, A., Korpelainen, E., Zanetti, G., Heljanko, K.: SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop. Bioinformatics 30(1), 119–120 (2014)

    Article  Google Scholar 

  28. Pireddu, L., Leo, S., Soranzo, N., Zanetti, G.: A Hadoop-galaxy adapter for user-friendly and scalable data-intensive bioinformatics in galaxy. In: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, BCB 2014, Newport Beach, California, USA, 20–23 September 2014, pp. 184–191 (2014)

    Google Scholar 

  29. Leo, S., Santoni, F., Zanetti, G.: Biodoop: bioinformatics on hadoop. In: International Conference on Parallel Processing Workshops, ICPPW 2009, Vienna, Austria, 22–25 September 2009, pp. 415–422 (2009)

    Google Scholar 

  30. ODriscoll, A., Daugelaite, J., Sleator, R.D.: Big data, Hadoop and cloud computing in genomics. J. Biomed. Inf. 46(5), 774–781 (2013)

    Article  Google Scholar 

  31. Matsunaga, A.M., Tsugawa, M.O., Fortes, J.A.B.: Cloudblast: combining mapreduce and virtualization on distributed resources for bioinformatics applications. In: e-Science 2008 Fourth International Conference on e-Science, Indianapolis, IN, USA, 7–12 December 2008, pp. 222–229 (2008)

    Google Scholar 

  32. Schatz, M.C.: Cloudburst: highly sensitive read mapping with mapreduce. Bioinformatics 25(11), 1363–1369 (2009)

    Article  Google Scholar 

  33. Venkata, V., Prasad, S., Loshma, G.: HPC-MAQ: a parallel short-read reference assembler

    Google Scholar 

  34. Langmead, B., Hansen, K.D., Leek, J.T.: Cloud-scale RNA-sequencing differential expression analysis with myrna. Genome Biol. 11(8), R83 (2010)

    Article  Google Scholar 

  35. Berrada, G., Keulen, M., Habib, M.B.: Hadoop for EEG storage and processing: a feasibility study. In: Ślȩzak, D., Tan, A.-H., Peters, J.F., Schwabe, L. (eds.) BIH 2014. LNCS (LNAI), vol. 8609, pp. 218–230. Springer, Heidelberg (2014). doi:10.1007/978-3-319-09891-3_21

    Google Scholar 

  36. Markonis, D., Schaer, R., Eggel, I., Müller, H., Depeursinge, A.: Using mapreduce for large-scale medical image analysis. In: 2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology, HISB 2012, La Jolla, CA, USA, 27–28 September 2012, p. 1 (2012)

    Google Scholar 

  37. Mangla, S., Raghava, N.S.: Iris recognition on hadoop: a biometrics system implementation on cloud computing. In: 2011 IEEE International Conference on Cloud Computing and Intelligence Systems, pp. 482–485, September 2011

    Google Scholar 

  38. McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., DePristo, M.A.: The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20(9), 1297–1303 (2010)

    Article  Google Scholar 

  39. Gurtowski, J., Schatz, M.C., Langmead, B.: Genotyping in the cloud with crossbow (2002)

    Google Scholar 

  40. Brock, M., Goscinski, A.: Execution of compute intensive applications on hybrid clouds (case study with mpiblast). In: Sixth International Conference on Complex, Intelligent, and Software Intensive Systems, CISIS 2012, Palermo, Italy, 4–6 July 2012, pp. 995–1000 (2012)

    Google Scholar 

  41. Vouzis, P.D., Sahinidis, N.V.: GPU-BLAST: using graphics processors to accelerate protein sequence alignment. Bioinformatics 27(2), 182–188 (2011)

    Article  Google Scholar 

  42. Benabderrahmane, S.: Enhancing transcriptomic data mining with semantic ranking: towards a new functional spectral representation. In: Rojas, I., Guzman, F.M.O.(eds.) Proceedings of the International Work-Conference on Bioinformatics and Biomedical Engineering, IWBBIO 2013, Granada, Spain, 18–20 March 2013, pp. 721–730. Copicentro Editorial (2013)

    Google Scholar 

  43. Hong, D., Rhie, A., Park, S.S., Lee, J., Ju, Y.S., Kim, S., Yu, S.B., Bleazard, T., Park, H.S., Rhee, H., Chong, H., Yang, K.S., Lee, Y.S., Kim, I.H., Lee, J.S., Kim, J.I., Seo, J.S.: FX: an RNA-Seq analysis tool on the cloud. Bioinformatics 28(5), 721–723 (2012)

    Article  Google Scholar 

  44. Wang, L., Chen, D., Ranjan, R., Khan, S.U., Kolodziej, J., Wang, J.: Parallel processing of massive EEG data with mapreduce. In: 18th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2012, Singapore, 17–19 December 2012, pp. 164–171 (2012)

    Google Scholar 

  45. Markonis, D., Schaer, R., Eggel, I., Müller, H., Depeursinge, A.: Using mapreduce for large-scale medical image analysis. CoRR, abs/1510.06937 (2015)

    Google Scholar 

  46. Alyass, A., Turcotte, M., Meyre, D.: From big data analysis to personalized medicine for all: challenges and opportunities. BMC Med. Genomics 8(1), 33 (2015)

    Article  Google Scholar 

  47. Naseer, A., Alkazemi, B.Y., Waraich, E.U.: A big data approach for proactive healthcare monitoring of chronic patients. In: 2016 Eighth International Conference on Ubiquitous and Future Networks (ICUFN), pp. 943–945, July 2016

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sidahmed Benabderrahmane .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Benabderrahmane, S. (2017). What Can the Big Data Eco-System and Data Analytics Do for E-Health? A Smooth Review Study. In: Rojas, I., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2017. Lecture Notes in Computer Science(), vol 10208. Springer, Cham. https://doi.org/10.1007/978-3-319-56148-6_56

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-56148-6_56

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-56147-9

  • Online ISBN: 978-3-319-56148-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics