skip to main content
10.1145/2882903.2899408acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

SnappyData: A Hybrid Transactional Analytical Store Built On Spark

Published:26 June 2016Publication History

ABSTRACT

In recent years, our customers have expressed frustration in the traditional approach of using a combination of disparate products to handle their streaming, transactional and analytical needs. The common practice of stitching heterogeneous environments in custom ways has caused enormous production woes by increasing development complexity and total cost of ownership. With SnappyData, an open source platform, we propose a unified engine for real-time operational analytics, delivering stream analytics, OLTP and OLAP in a single integrated solution. We realize this platform through a seamless integration of Apache Spark (as a big data computational engine) with GemFire (as an in-memory transactional store with scale-out SQL semantics). In this demonstration, after presenting a few use case scenarios, we exhibit SnappyData as our our in-memory solution for delivering truly interactive analytics (i.e., a couple of seconds), when faced with large data volumes or high velocity streams. We show that SnappyData can exploit state-of-the-art approximate query processing techniques and a variety of data synopses. Finally, we allow the audience to define various high-level accuracy contracts (HAC), to communicate their accuracy requirements with SnappyData in an intuitive fashion.

References

  1. Apache Samza. http://samza.apache.org/.Google ScholarGoogle Scholar
  2. S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, and I. Stoica. BlinkDB: queries with bounded errors and bounded response times on very large data. In EuroSys, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Armbrust et al. Spark SQL: Relational data processing in Spark. In SIGMOD, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Braun et al. Analytics in motion: High performance event-processing and real-time analytics in the same database. In SIGMOD, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 55, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Kornacker et al. Impala: A modern, open-source sql engine for hadoop. In CIDR, 2015.Google ScholarGoogle Scholar
  7. E. Liarou et al. Monetdb/datacell: online analytics in a streaming column-store. PVLDB, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Mozafari and N. Niu. A handbook for building an approximate query engine. IEEE Data Engineering Bulletin, 2015.Google ScholarGoogle Scholar
  9. B. Mozafari and C. Zaniolo. Optimal load shedding with aggregates and mining queries. In ICDE, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  10. A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: a warehousing solution over a map-reduce framework. Proceedings of the VLDB Endowment, 2(2):1626--1629, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal, and D. Ryaboy. Storm@twitter. In SIGMOD, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. SnappyData: A Hybrid Transactional Analytical Store Built On Spark

                              Recommendations

                              Comments

                              Login options

                              Check if you have access through your login credentials or your institution to get full access on this article.

                              Sign in
                              • Published in

                                cover image ACM Conferences
                                SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
                                June 2016
                                2300 pages
                                ISBN:9781450335317
                                DOI:10.1145/2882903

                                Copyright © 2016 ACM

                                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                                Publisher

                                Association for Computing Machinery

                                New York, NY, United States

                                Publication History

                                • Published: 26 June 2016

                                Permissions

                                Request permissions about this article.

                                Request Permissions

                                Check for updates

                                Qualifiers

                                • research-article

                                Acceptance Rates

                                Overall Acceptance Rate785of4,003submissions,20%

                              PDF Format

                              View or Download as a PDF file.

                              PDF

                              eReader

                              View online with eReader.

                              eReader