Abstract
Apache Pig system generates MapReduce jobs by compiling program scripts written in Pig Latin to process large data sets in parallel on distributed computing nodes. There are inefficient features in Pig due to the limitation of the MapReduce, e.g., the MapReduce is used only for batch processing. As various smart devices are extensively utilized recently, streams of data are generated explosively and the need to process streams of data in real-time is required. In this paper, we propose a data flow language processing system, called LAMA-CEP, by generating DAG-based stream processing services to process unbounded streams of data in real-time continuously. We present a stream processing language, called Pig Latin Stream extended from Pig Latin. Programs written in Pig Latin Stream are translated into distributed stream processing jobs and then the jobs are executed on a highly scalable distributed stream processing system to process large streams of data in real-time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Apache Hadoop. http://hadoop.apache.org/
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Apache Hadoop MapReduce. https://developer.yahoo.com/hadoop/tutorial/module4.html
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1099–1110, Vancouver, Canada (2008)
Apache Pig. http://hadoop.apache.org/pig/
Gantz, J.F.: The Diverse and Exploding Digital Universe. IDC (2008)
Distributed and fault-tolerant realtime computation. http://storm.incubator.apache.org/
Neumeyer, L., Robbins, B., Nair, A., Kesari, A.: S4: distributed stream computing platform. In: 10th IEEE International Conference on Data Mining Workshops (ICDMW), pp. 170–177, Sydney, Australia (2010)
Acknowledgments
This work was supported by the ICT R&D program of MSIP/IITP. [14-000-05-001, Smart Networking Core Technology Development].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Park, C.S., Jeong, JH., Lee, M., Lee, YJ., Lee, M., Hur, S.J. (2015). Real-Time Data Flow Language Processing System for Handling Streams of Data. In: Jung, J., Badica, C., Kiss, A. (eds) Scalable Information Systems. INFOSCALE 2014. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 139. Springer, Cham. https://doi.org/10.1007/978-3-319-16868-5_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-16868-5_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16867-8
Online ISBN: 978-3-319-16868-5
eBook Packages: Computer ScienceComputer Science (R0)