Skip to main content

Process Streaming Healthcare Data with Adaptive MapReduce Framework

  • Chapter
  • First Online:
Handbook of Large-Scale Distributed Computing in Smart Healthcare

Part of the book series: Scalable Computing and Communications ((SCC))

  • 2053 Accesses

Abstract

As one of the most widely used healthcare scientific applications, body area network with hundreds of interconnected sensors need to be used to monitor the health status of a physical body. It is very challenging to process, analyze and monitor the streaming data in real time. Therefore, an efficient cloud platform with very elastic scaling capacity is needed to support such kind of real-time streaming data applications. The state-of-art cloud platform either lacks of such capability to process highly concurrent streaming data, or scales in regards to coarse-grained compute nodes. In this chapter, we propose a task-level adaptive MapReduce framework. This framework extends the generic MapReduce architecture by designing each Map and Reduce task as a scalable daemon process. The beauty of this new framework is the scaling capability being designed at the Map and Reduce task level, rather than being scaled at the compute-node level, as traditional MapReduce does. This design is capable of not only scaling up and down in real time, but also leading to effective use of compute resources in cloud data center. As a first step towards implementing this framework in real cloud, we have developed a simulator that captures workload strength, and provisions the just-in-need amount of Map and Reduce tasks in realtime. To further enhance the framework, we applied two streaming data workload prediction methods, smoothing and Kalman filter, to estimate the workload characteristics. We see 63.1% performance improvement by using the Kalman filter method to predict the workload. We also use real streaming data workload trace to test the framework. Experimental results show that this framework schedules the Map and Reduce tasks very efficiently, as the streaming data changes its arrival rate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. S. Ullah, H. Higgins, B. Braem, et al. A Comprehensive Survey of Wireless Body Area Networks. Journal of Medical Systems 36(3)(2010) 1065–1094.

    Google Scholar 

  2. M. Chen, S. Gonzalez, A. Vasilakos, et al. Body Area Networks: A Survey. ACM/Springer Mobile Networks and Applications. 16(2)(2011) 171–193.

    Google Scholar 

  3. R. Schmidt, T. Norgall, J. Mörsdorf, et al. Body Area Network BAN–a key infrastructure element for patient-centered medical applications. Biomed Tech 47(1)(2002)365–8.

    Google Scholar 

  4. J. Dean and S. Ghemawat, Mapreduce: Simplified Data Processing On Large Clusters, in: Proc. of 19th ACM symp. on Operating Systems Principles, OSDI 2004, pp. 137–150.

    Google Scholar 

  5. G. Malewicz, M. H. Austern, A. J. C. Bik, et al. Pregel: A System for Large-Scale Graph Processing, in: Proc. of the 2008 ACM SIGMOD international conference on Management of data, SIGMOD 2010, pp. 135–146.

    Google Scholar 

  6. Y. Low, J. Gonzalez, A. Kyrola, et al, GraphLab: A New Framework for Parallel Machine Learning, in: Proc. of the 26th Conference on Uncertainty in Artificial Intelligence, UAI 2010.

    Google Scholar 

  7. Y. Low, J. Gonzalez, A. Kyrola, et al, Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud, Journal Proceedings of the VLDB Endowment, 5(8)(2012), pp. 716–727.

    Google Scholar 

  8. http://aws.amazon.com/elasticmapreduce/.

  9. F. Zhang, M. F. Sakr, Cluster-size Scaling and MapReduce Execution Times, in: Proc. of The International Conference on Cloud Computing and Science, CloudCom 2013.

    Google Scholar 

  10. R. Haux, Health information systems–past, present, future, International Journal of Medical Informatics, 75(3–4)(2006), pp. 268–281.

    Google Scholar 

  11. P. L. Reichertz, Hospital information systems—Past, present, future, International Journal of Medical Informatics, 75(3–4)(2006), pp. 282–299.

    Google Scholar 

  12. http://hadoop.apache.org/.

  13. J. Talbot, R. M. Yoo and C. Kozyrakis, Phoenix++: modular MapReduce for shared-memory systems, in: Proc. of the second international workshop on MapReduce and its applications, MapReduce 2011, pp. 9–16.

    Google Scholar 

  14. O. Christopher, C. Greg and C. Laukik, et al, Nova: Continuous Pig/Hadoop Workfows, in: Proc. of the 2011 ACM SIGMOD international conference on Management of data, SIGMOD 2011, pp. 1081–1090.

    Google Scholar 

  15. C. Olston, B. Reed, U. Srivastava, et al, Pig latin: a not-so-foreign language for data processing, in: Proc. of the 2008 ACM SIGMOD international conference on Management of data, SIGMOD 2008, pp. 1099–1110.

    Google Scholar 

  16. P. Bhatotia, A. Wieder and R. Rodrigues, et al, Incoop: MapReduce for incremental computations, in: Proc. of the 2nd ACM Symposium on Cloud Computing, SoCC 2011.

    Google Scholar 

  17. L. Neumeyer, B. Robbins and A. Nair, et al, S4: Distributed Stream Computing Platform, in: Proc. of the International Workshop on Knowledge Discovery Using Cloud and Distributed Computing Platforms, KDCloud 10, pp. 170–177.

    Google Scholar 

  18. http://storm.incubator.apache.org/.

  19. http://www.scribesoft.com/.

  20. J. Kreps, N. Narkhede, J. Rao et al. Kafka: A Distributed Messaging System for Log Processing. in: Proc. of 6th International Workshop on Networking Meets Databases NetDB 2011.

    Google Scholar 

  21. http://flume.apache.org/index.html.

  22. http://www.streambase.com/.

  23. http://www.hstreaming.com/.

  24. http://esper.codehaus.org/.

  25. R. E. Kalman, A new approach to linear filtering and prediction problems, Journal of Basic Engineering 82(1)(1960), pp. 35–45.

    Google Scholar 

  26. http://www.mathworks.com/products/simevents/.

  27. C. Otto, A. Milenković, C. Sanders and E. Jovanov, System architecture of a wireless body area sensor network for ubiquitous health monitoring, 1(4)(2005), pp. 307–326.

    Google Scholar 

  28. E. Jovanov, A. Milenkovic, C. Otto1 and P. C de Groen, A wireless body area network of intelligent motion sensors for computer assisted physical rehabilitation, Journal of NeuroEngineering and Rehabilitation, 2(6)(2005), pp. 1–10.

    Google Scholar 

  29. M. Arlitt, T. Jin, Workload characterization of the 1998 World Cup Web Site (Tech. Rep. No. HPL-1999-35R1). Palo Alto, CA: HP Labs.

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Nature Science Foundation of China under grant No. 61233016, by the Ministry of Science and Technology of China under National 973 Basic Research Grants No. 2011CB302505, No. 2013CB228206, Guangdong Innovation Team Grant 201001D0104726115 and National Science Foundation under grant CCF-1016966. The work was also partially supported by an IBM Fellowship for Fan Zhang, and by the Intellectual Ventures endowment to Tsinghua University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fan Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Zhang, F., Cao, J., Khan, S.U., Li, K., Hwang, K. (2017). Process Streaming Healthcare Data with Adaptive MapReduce Framework. In: Khan, S., Zomaya, A., Abbas, A. (eds) Handbook of Large-Scale Distributed Computing in Smart Healthcare. Scalable Computing and Communications. Springer, Cham. https://doi.org/10.1007/978-3-319-58280-1_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-58280-1_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-58279-5

  • Online ISBN: 978-3-319-58280-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics