research-article

SnappyData: A Hybrid Transactional Analytical Store Built On Spark

Authors:
Jags Ramnarayan

SnappyData Inc, PORTLAND, OR, USA

SnappyData Inc, PORTLAND, OR, USA
View Profile

,
Barzan Mozafari

University of Michigan, Ann Arbor & SnappyData Inc, Ann Arbor, USA

University of Michigan, Ann Arbor & SnappyData Inc, Ann Arbor, USA
View Profile

,
Sumedh Wale

SnappyData Inc, Pune, India

SnappyData Inc, Pune, India
View Profile

,
Sudhir Menon

SnappyData Inc, PORTLAND, OR, USA

SnappyData Inc, PORTLAND, OR, USA
View Profile

,
Neeraj Kumar

SnappyData Inc, Pune, India

SnappyData Inc, Pune, India
View Profile

,
Hemant Bhanawat

SnappyData Inc, Pune, India

SnappyData Inc, Pune, India
View Profile

,
Soubhik Chakraborty

SnappyData Inc, Pune, India

SnappyData Inc, Pune, India
View Profile

,
Yogesh Mahajan

SnappyData Inc, Pune, India

SnappyData Inc, Pune, India
View Profile

,
Rishitesh Mishra

SnappyData Inc, Pune, India

SnappyData Inc, Pune, India
View Profile

,
Kishor Bachhav

SnappyData Inc, Pune, India

SnappyData Inc, Pune, India
View Profile

SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataJune 2016Pages 2153–2156https://doi.org/10.1145/2882903.2899408

Published:26 June 2016Publication History

SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data

Pages 2153–2156

ABSTRACT

In recent years, our customers have expressed frustration in the traditional approach of using a combination of disparate products to handle their streaming, transactional and analytical needs. The common practice of stitching heterogeneous environments in custom ways has caused enormous production woes by increasing development complexity and total cost of ownership. With SnappyData, an open source platform, we propose a unified engine for real-time operational analytics, delivering stream analytics, OLTP and OLAP in a single integrated solution. We realize this platform through a seamless integration of Apache Spark (as a big data computational engine) with GemFire (as an in-memory transactional store with scale-out SQL semantics). In this demonstration, after presenting a few use case scenarios, we exhibit SnappyData as our our in-memory solution for delivering truly interactive analytics (i.e., a couple of seconds), when faced with large data volumes or high velocity streams. We show that SnappyData can exploit state-of-the-art approximate query processing techniques and a variety of data synopses. Finally, we allow the audience to define various high-level accuracy contracts (HAC), to communicate their accuracy requirements with SnappyData in an intuitive fashion.

References

Apache Samza. http://samza.apache.org/.Google Scholar
S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, and I. Stoica. BlinkDB: queries with bounded errors and bounded response times on very large data. In EuroSys, 2013. Google ScholarDigital Library
M. Armbrust et al. Spark SQL: Relational data processing in Spark. In SIGMOD, 2015. Google ScholarDigital Library
L. Braun et al. Analytics in motion: High performance event-processing and real-time analytics in the same database. In SIGMOD, 2015. Google ScholarDigital Library
G. Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 55, 2005. Google ScholarDigital Library
M. Kornacker et al. Impala: A modern, open-source sql engine for hadoop. In CIDR, 2015.Google Scholar
E. Liarou et al. Monetdb/datacell: online analytics in a streaming column-store. PVLDB, 2012. Google ScholarDigital Library
B. Mozafari and N. Niu. A handbook for building an approximate query engine. IEEE Data Engineering Bulletin, 2015.Google Scholar
B. Mozafari and C. Zaniolo. Optimal load shedding with aggregates and mining queries. In ICDE, 2010.Google ScholarCross Ref
A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: a warehousing solution over a map-reduce framework. Proceedings of the VLDB Endowment, 2(2):1626--1629, 2009. Google ScholarDigital Library
A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal, and D. Ryaboy. Storm@twitter. In SIGMOD, 2014. Google ScholarDigital Library

Index Terms

Recommendations

Hybrid Transactional/Analytical Processing: A Survey
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data

The popularity of large-scale real-time analytics applications (real-time inventory/pricing, recommendations from mobile apps, fraud detection, risk analysis, IoT, etc.) keeps rising. These applications require distributed data management systems that ...
Read More
A configurable and executable model of Spark Streaming on Apache YARN

Streams of data are produced today at an unprecedented scale. Efficient and stable processing of these streams requires a careful interplay between the parameters of the streaming application and of the underlying stream processing framework. Today, ...
Read More
SnappyData: a hybrid system for transactions, analytics, and streaming: demo
DEBS '16: Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems

An increasing number of applications rely on workflows that involve (1) continuous stream processing, (2) transactional and write-heavy workloads, and (3) interactive SQL analytics. These applications need to consume high-velocity streams to trigger ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
June 2016
2300 pages
ISBN:9781450335317
DOI:10.1145/2882903
General Chairs:
Fatma Özcan
IBM Research, USA
,
Georgia Koutrika
HP Labs, USA
,
Program Chair:
Sam Madden
Massachusetts Institute of Technology, USA
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 June 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
OLAP
OLTP
in-memory database
spark
spark streaming
stream analytics
stream processing
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 32
  Total Citations
  View Citations
- 1,057
  Total Downloads
- Downloads (Last 12 months)19
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

SnappyData: A Hybrid Transactional Analytical Store Built On Spark

SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Hybrid Transactional/Analytical Processing: A Survey

A configurable and executable model of Spark Streaming on Apache YARN

SnappyData: a hybrid system for transactions, analytics, and streaming: demo

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

SnappyData: A Hybrid Transactional Analytical Store Built On Spark

SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Hybrid Transactional/Analytical Processing: A Survey

A configurable and executable model of Spark Streaming on Apache YARN

SnappyData: a hybrid system for transactions, analytics, and streaming: demo

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media