Abstract
With the advent of rapid increase in the size of data in large cluster systems and the transformation of data into big data in major data intensive organizations and applications, it is very necessary to build efficient and flexible Sql to Map-Reduce translators that would make Tera to Peta bytes of data easy to access and retrieve because conventional Sql-based data processing has limited scalability in these cases. In this paper we propose a Smart Sql to Map-Reduce Translator (S2MART), which transforms the Sql queries into Map-Reduce jobs with the inclusion of intra-query correlation for minimizing redundant operation, sub-query generation and spiral modeled database for reducing data transfer cost and network transfer cost. S2MART also applies the concept of views in database to perform parallelization of big data easy and streamlined. This paper gives a comprehensive study about the various features of S2MART and we compare the performance and correctness of our system with two widely used Sql to Map-Reduce translators’ hive and pig.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Jiang, D., Tung, A.K.H., Chen, G.: MAP-JOIN-REDUCE: Toward Scalable and Efficient Data Analysis on Large Clusters. IEEE Transactions on Knowledge and Data Engineering 23(9) (September 2011)
Warneke, D., Kao, O.: Exploiting Dynamic Resource Allocation for Efficient Parallel Data Processing in the Cloud. IEEE Transactions on Parallel and Distributed Systems 22(6) (June 2011)
Jiang, W., Agrawal, G.: MATE-CG: A MapReduce-Like Framework for Accelerating Data-Intensive Computations on Heterogeneous Clusters. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium (2012)
Sakr, S., Liu, A., Batista, D.M., Alomari, M.: A Survey of Large Scale Data Management Approaches in Cloud Environments. IEEE Communications Surveys and Tutorials 13(3) (Third Quarter 2011)
Han, J., Song, M., Song, J.: A Novel Solution of Distributed Memory NoSQL Database for Cloud Computing. In: 10th IEEE/ACIS International Conference on Computer and Information Science (2011)
Bisdikian, C.: Challenges for Mobile Data Management in the Era of Cloud and Social Computing. In: 2011 12th IEEE International Conference on Mobile Data Management (2011)
Nicolae, B., Moise, D., Antoniu, G., Bougé, L., Dorier, M.: BlobSeer: Bringing High Throughput under Heavy Concurrency to Hadoop Map-Reduce Applications. In: International IEEE Conference (2010)
Zhang, J., Wu, X.: A 2-Tier Clustering Algorithm with Map-Reduce. In: The Fifth Annual ChinaGrid Conference (2010)
Pallickara, S., Ekanayake, J., Fox, G.: Granules: A Lightweight, Streaming Runtime for Cloud Computing With Support for Map-Reduce. In: IEEE International Conference (2009)
Starkey, J.: Presentation of nuoDB, http://www.siia.net/presentations/software/AATC2012/NextGen_NuoDB.pdf
Starkey, J.: Presentation of nuoDB, http://www.cs.brown.edu/courses/cs227/slides/dtxn/nuodb.pdf
Zhang, Z., Cherkasova, L., Vermam, A., Loo, B.T.: Optimizing Completion Time and Resource Provisioning of Pig Programs. In: 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (2012)
Olsten, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Christopher Olsten, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, Andrew Tomkins. Presented by Welch, D.
Thusoo, A., Sen Sarma, J., Jain, N., Shao, Z., Chakka, P., Zhang, N., Antony, S., Liu, H., Murthy, R.: Hive – A Petabyte Scale Data Warehouse Using Hadoop. In: IEEE International Conference (2010)
Michel, S., Theobald, M.: HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. Presented by Raber, F.
Thusoo, A., Sen Sarma, J., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive A Warehousing Solution Over a MapReduce Framework. In: VLDB 2009, Lyon, France (August 2009)
Chaiken, R., Jenkins, B., Larson, P.-Ă…., Ramsey, B., Shakib, D., Weaver, S., Zhou, J.: SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets. In: VLDB 2008, Auckland, New Zealand, August 24-30 (2008)
Yang, C.: Osprey: Implementing MapReduce-Style Fault Tolerance in a Shared-Nothing Distributed Database. In: ICDE Conference (2010)
Isard, Y.Y.M.: DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language. Microsoft research labs
He, Y., Lee, R.: RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems. In: ICDE Conference (2011)
Foley, M.: High Availability HDFS, http://storageconference.org/2012/Presentations/M07.Foley.pdf
Chandar, J.: Join Algorithms using Map/Reduce. University of Edinburgh (2010)
Chen, T., Taura, K.: ParaLite: Supporting Collective Queries in Database System to Parallelize User-Defined Executable. In: 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (2012)
Leu, J.-S., Yee, Y.-S., Chen, W.-L.: Comparison of Map-Reduce and SQL on Large-scale Data Processing. In: International Symposium on Parallel and Distributed Processing with Applications (2010)
Hsieh, M.-J.: SQLMR: A Scalable Database Management System for Cloud Computing. In: 2011 International Conference on Parallel Processing (2011)
Zhu, M., Risch, T.: Querying Combined Cloud-Based and Relational Databases. In: 2011 International Conference on Cloud and Service Computing (2011)
Husain, M.F.: Scalable Complex Query Processing Over Large Semantic Web Data Using Cloud. In: 2011 IEEE 4th International Conference on Cloud Computing (2011)
Hu, W.: A Hybrid Join Algorithm on Top of Map Reduce. In: 2011 Seventh International Conference on Semantics, Knowledge and Grids (2011)
Introduction to Hive by Cloudera 2009, http://www.cloudera.com/wp-content/uploads/2010/01/6-IntroToHive.pdf
Wallom, D.: myTrustedCloud: Trusted Cloud Infrastructure for Security-critical Computation and Data Management. In: 2011 Third IEEE International Conference on Cloud Computing Technology and Science (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gowraj, N., Ravi, P.V., V, M., Sumalatha, M.R. (2013). S2MART: Smart Sql to Map-Reduce Translators. In: Ishikawa, Y., Li, J., Wang, W., Zhang, R., Zhang, W. (eds) Web Technologies and Applications. APWeb 2013. Lecture Notes in Computer Science, vol 7808. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37401-2_56
Download citation
DOI: https://doi.org/10.1007/978-3-642-37401-2_56
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37400-5
Online ISBN: 978-3-642-37401-2
eBook Packages: Computer ScienceComputer Science (R0)