Skip to main content

Real-Time Data Flow Language Processing System for Handling Streams of Data

  • Conference paper
  • First Online:
Scalable Information Systems (INFOSCALE 2014)

Abstract

Apache Pig system generates MapReduce jobs by compiling program scripts written in Pig Latin to process large data sets in parallel on distributed computing nodes. There are inefficient features in Pig due to the limitation of the MapReduce, e.g., the MapReduce is used only for batch processing. As various smart devices are extensively utilized recently, streams of data are generated explosively and the need to process streams of data in real-time is required. In this paper, we propose a data flow language processing system, called LAMA-CEP, by generating DAG-based stream processing services to process unbounded streams of data in real-time continuously. We present a stream processing language, called Pig Latin Stream extended from Pig Latin. Programs written in Pig Latin Stream are translated into distributed stream processing jobs and then the jobs are executed on a highly scalable distributed stream processing system to process large streams of data in real-time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Apache Hadoop. http://hadoop.apache.org/

  2. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  3. Apache Hadoop MapReduce. https://developer.yahoo.com/hadoop/tutorial/module4.html

  4. Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1099–1110, Vancouver, Canada (2008)

    Google Scholar 

  5. Apache Pig. http://hadoop.apache.org/pig/

  6. Gantz, J.F.: The Diverse and Exploding Digital Universe. IDC (2008)

    Google Scholar 

  7. Distributed and fault-tolerant realtime computation. http://storm.incubator.apache.org/

  8. Neumeyer, L., Robbins, B., Nair, A., Kesari, A.: S4: distributed stream computing platform. In: 10th IEEE International Conference on Data Mining Workshops (ICDMW), pp. 170–177, Sydney, Australia (2010)

    Google Scholar 

Download references

Acknowledgments

This work was supported by the ICT R&D program of MSIP/IITP. [14-000-05-001, Smart Networking Core Technology Development].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Choon Seo Park .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Cite this paper

Park, C.S., Jeong, JH., Lee, M., Lee, YJ., Lee, M., Hur, S.J. (2015). Real-Time Data Flow Language Processing System for Handling Streams of Data. In: Jung, J., Badica, C., Kiss, A. (eds) Scalable Information Systems. INFOSCALE 2014. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 139. Springer, Cham. https://doi.org/10.1007/978-3-319-16868-5_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16868-5_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16867-8

  • Online ISBN: 978-3-319-16868-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics