Skip to main content

S2MART: Smart Sql to Map-Reduce Translators

  • Conference paper
Book cover Web Technologies and Applications (APWeb 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7808))

Included in the following conference series:

Abstract

With the advent of rapid increase in the size of data in large cluster systems and the transformation of data into big data in major data intensive organizations and applications, it is very necessary to build efficient and flexible Sql to Map-Reduce translators that would make Tera to Peta bytes of data easy to access and retrieve because conventional Sql-based data processing has limited scalability in these cases. In this paper we propose a Smart Sql to Map-Reduce Translator (S2MART), which transforms the Sql queries into Map-Reduce jobs with the inclusion of intra-query correlation for minimizing redundant operation, sub-query generation and spiral modeled database for reducing data transfer cost and network transfer cost. S2MART also applies the concept of views in database to perform parallelization of big data easy and streamlined. This paper gives a comprehensive study about the various features of S2MART and we compare the performance and correctness of our system with two widely used Sql to Map-Reduce translators’ hive and pig.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jiang, D., Tung, A.K.H., Chen, G.: MAP-JOIN-REDUCE: Toward Scalable and Efficient Data Analysis on Large Clusters. IEEE Transactions on Knowledge and Data Engineering 23(9) (September 2011)

    Google Scholar 

  2. Warneke, D., Kao, O.: Exploiting Dynamic Resource Allocation for Efficient Parallel Data Processing in the Cloud. IEEE Transactions on Parallel and Distributed Systems 22(6) (June 2011)

    Google Scholar 

  3. Jiang, W., Agrawal, G.: MATE-CG: A MapReduce-Like Framework for Accelerating Data-Intensive Computations on Heterogeneous Clusters. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium (2012)

    Google Scholar 

  4. Sakr, S., Liu, A., Batista, D.M., Alomari, M.: A Survey of Large Scale Data Management Approaches in Cloud Environments. IEEE Communications Surveys and Tutorials 13(3) (Third Quarter 2011)

    Google Scholar 

  5. Han, J., Song, M., Song, J.: A Novel Solution of Distributed Memory NoSQL Database for Cloud Computing. In: 10th IEEE/ACIS International Conference on Computer and Information Science (2011)

    Google Scholar 

  6. Bisdikian, C.: Challenges for Mobile Data Management in the Era of Cloud and Social Computing. In: 2011 12th IEEE International Conference on Mobile Data Management (2011)

    Google Scholar 

  7. Nicolae, B., Moise, D., Antoniu, G., Bougé, L., Dorier, M.: BlobSeer: Bringing High Throughput under Heavy Concurrency to Hadoop Map-Reduce Applications. In: International IEEE Conference (2010)

    Google Scholar 

  8. Zhang, J., Wu, X.: A 2-Tier Clustering Algorithm with Map-Reduce. In: The Fifth Annual ChinaGrid Conference (2010)

    Google Scholar 

  9. Pallickara, S., Ekanayake, J., Fox, G.: Granules: A Lightweight, Streaming Runtime for Cloud Computing With Support for Map-Reduce. In: IEEE International Conference (2009)

    Google Scholar 

  10. Starkey, J.: Presentation of nuoDB, http://www.siia.net/presentations/software/AATC2012/NextGen_NuoDB.pdf

  11. Starkey, J.: Presentation of nuoDB, http://www.cs.brown.edu/courses/cs227/slides/dtxn/nuodb.pdf

  12. Zhang, Z., Cherkasova, L., Vermam, A., Loo, B.T.: Optimizing Completion Time and Resource Provisioning of Pig Programs. In: 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (2012)

    Google Scholar 

  13. Olsten, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Christopher Olsten, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, Andrew Tomkins. Presented by Welch, D.

    Google Scholar 

  14. Thusoo, A., Sen Sarma, J., Jain, N., Shao, Z., Chakka, P., Zhang, N., Antony, S., Liu, H., Murthy, R.: Hive – A Petabyte Scale Data Warehouse Using Hadoop. In: IEEE International Conference (2010)

    Google Scholar 

  15. Michel, S., Theobald, M.: HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. Presented by Raber, F.

    Google Scholar 

  16. Thusoo, A., Sen Sarma, J., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive A Warehousing Solution Over a MapReduce Framework. In: VLDB 2009, Lyon, France (August 2009)

    Google Scholar 

  17. Chaiken, R., Jenkins, B., Larson, P.-Ă…., Ramsey, B., Shakib, D., Weaver, S., Zhou, J.: SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets. In: VLDB 2008, Auckland, New Zealand, August 24-30 (2008)

    Google Scholar 

  18. Yang, C.: Osprey: Implementing MapReduce-Style Fault Tolerance in a Shared-Nothing Distributed Database. In: ICDE Conference (2010)

    Google Scholar 

  19. Isard, Y.Y.M.: DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language. Microsoft research labs

    Google Scholar 

  20. He, Y., Lee, R.: RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems. In: ICDE Conference (2011)

    Google Scholar 

  21. Foley, M.: High Availability HDFS, http://storageconference.org/2012/Presentations/M07.Foley.pdf

  22. Chandar, J.: Join Algorithms using Map/Reduce. University of Edinburgh (2010)

    Google Scholar 

  23. Chen, T., Taura, K.: ParaLite: Supporting Collective Queries in Database System to Parallelize User-Defined Executable. In: 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (2012)

    Google Scholar 

  24. Leu, J.-S., Yee, Y.-S., Chen, W.-L.: Comparison of Map-Reduce and SQL on Large-scale Data Processing. In: International Symposium on Parallel and Distributed Processing with Applications (2010)

    Google Scholar 

  25. Hsieh, M.-J.: SQLMR: A Scalable Database Management System for Cloud Computing. In: 2011 International Conference on Parallel Processing (2011)

    Google Scholar 

  26. Zhu, M., Risch, T.: Querying Combined Cloud-Based and Relational Databases. In: 2011 International Conference on Cloud and Service Computing (2011)

    Google Scholar 

  27. Husain, M.F.: Scalable Complex Query Processing Over Large Semantic Web Data Using Cloud. In: 2011 IEEE 4th International Conference on Cloud Computing (2011)

    Google Scholar 

  28. Hu, W.: A Hybrid Join Algorithm on Top of Map Reduce. In: 2011 Seventh International Conference on Semantics, Knowledge and Grids (2011)

    Google Scholar 

  29. Introduction to Hive by Cloudera 2009, http://www.cloudera.com/wp-content/uploads/2010/01/6-IntroToHive.pdf

  30. Wallom, D.: myTrustedCloud: Trusted Cloud Infrastructure for Security-critical Computation and Data Management. In: 2011 Third IEEE International Conference on Cloud Computing Technology and Science (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gowraj, N., Ravi, P.V., V, M., Sumalatha, M.R. (2013). S2MART: Smart Sql to Map-Reduce Translators. In: Ishikawa, Y., Li, J., Wang, W., Zhang, R., Zhang, W. (eds) Web Technologies and Applications. APWeb 2013. Lecture Notes in Computer Science, vol 7808. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37401-2_56

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37401-2_56

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37400-5

  • Online ISBN: 978-3-642-37401-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics