skip to main content
research-article

Transactional Auto Scaler: Elastic Scaling of Replicated In-Memory Transactional Data Grids

Published: 01 July 2014 Publication History

Abstract

In this article, we introduce TAS (Transactional Auto Scaler), a system for automating the elastic scaling of replicated in-memory transactional data grids, such as NoSQL data stores or Distributed Transactional Memories. Applications of TAS range from online self-optimization of in-production applications to the automatic generation of QoS/cost-driven elastic scaling policies, as well as to support for what-if analysis on the scalability of transactional applications.
In this article, we present the key innovation at the core of TAS, namely, a novel performance forecasting methodology that relies on the joint usage of analytical modeling and machine learning. By exploiting these two classically competing approaches in a synergic fashion, TAS achieves the best of the two worlds, namely, high extrapolation power and good accuracy, even when faced with complex workloads deployed over public cloud infrastructures.
We demonstrate the accuracy and feasibility of TAS’s performance forecasting methodology via an extensive experimental study based on a fully fledged prototype implementation integrated with a popular open-source in-memory transactional data grid (Red Hat’s Infinispan) and industry-standard benchmarks generating a breadth of heterogeneous workloads.

References

[1]
Ahmed Ali-Eldin, Maria Kihl, Johan Tordsson, and Erik Elmroth. 2012a. Efficient provisioning of bursty scientific workloads on the cloud using adaptive elasticity control. In Proc. of the Workshop on Scientific Cloud Computing Date (ScienceCloud’12).
[2]
Ahmed Ali-Eldin, Johan Tordsson, and Erik Elmroth. 2012b. An adaptive hybrid elasticity controller for cloud infrastructures. In Proc. of the Network Operations and Management Symposium (NOMS’12).
[3]
Amazon. 2013. Amazon S3. Available at http://aws.amazon.com/s3/.
[4]
Jason Baker, Chris Bond, James C. Corbett, J. J. Furman, Andrey Khorlin, James Larson, Jean-Michel Leon, Yawei Li, Alexander Lloyd, and Vadim Yushprakh. 2011. Megastore: Providing scalable, highly available storage for interactive services. In Proc. of the Conference on Innovative Data System Research (CIDR’11).
[5]
Bela Ban. 2012. JGroups—A Toolkit for Reliable Multicast Communication. Available at http://www.jgroups.org.
[6]
Hal Berenson, Phil Bernstein, Jim Gray, Jim Melton, Elizabeth O’Neil, and Patrick O’Neil. 1995. A critique of ANSI SQL isolation levels. In Proc. of the ACM SIGMOD International Conference on Management of Data.
[7]
Philip A. Bernstein, Vassos Hadzilacos, and Nathan Goodman. 1986. Concurrency Control and Recovery in Database Systems. Addison-Wesley Longman.
[8]
U. Narayan Bhat, Mohamed Shalaby, and Martin J. Fischer. 1979. Approximation techniques in the solution of queueing problems. Naval Research Logistics Quarterly 26, 2, 311--326.
[9]
Christopher M. Bishop. 2007. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer.
[10]
Jin Chen, Gokul Soundararajan, and Cristiana Amza. 2006. Autonomic provisioning of backend databases in dynamic content web servers. In Proc. of the International Conference on Autonomic Computing (ICAC).
[11]
Bruno Ciciani, Daniel M. Dias, and Philip S. Yu. 1990. Analysis of replication in distributed database systems. IEEE Transactions on Knowledge and Data Engineering 2, 2 (1990), 247--261.
[12]
Yi Dai, Yunzhao Luo, Zhonghua Li, and Zhaojun Wang. 2011. A new adaptive CUSUM control chart for detecting the multivariate process mean. Quality and Reliability Engineering International 27, 7 (2011), 877--824.
[13]
Pierangelo di Sanzo, Bruno Ciciani, Roberto Palmieri, Francesco Quaglia, and Paolo Romano. 2012. On the analytical modeling of concurrency control algorithms for Software Transactional Memories: The case of Commit-Time-Locking. Performance Evaluation 69, 5 (2012), 187--205.
[14]
Pierangelo di Sanzo, Bruno Ciciani, Francesco Quaglia, and Paolo Romano. 2008. A performance model of multi-version concurrency control. In Proc. of the International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS’08).
[15]
Pierangelo di Sanzo, Roberto Palmieri, Bruno Ciciani, Francesco Quaglia, and Paolo Romano. 2010. Analytical modeling of lock-based concurrency control with arbitrary transaction data access patterns. In Proc. of WOSP/SIPEW International Conference on Performance Engineering (ICPE’10).
[16]
Dave Dice, Ori Shalev, and Nir Shavit. 2006. Transactional locking II. In Proc. of the International Symposium on Distributed Computing (DISC’06).
[17]
Xavier Dutreilh, Sergey Kirgizov, Olga Melekhova, Jacques Malenfant, Nicolas Rivierre, and Isis Truck. 2011. Using reinforcement learning for autonomic resource allocation in clouds: Towards a fully automated workflow. In Proc. of the International Conference on Autonomic and Autonomous Systems (ICAS’11).
[18]
Sameh Elnikety, Steven Dropsho, Emmanuel Cecchet, and Willy Zwaenepoel. 2009. Predicting replicated database scalability from standalone database profiling. In Proc. of the European Conference on Computer systems (EuroSys’09).
[19]
Saeed Ghanbari, Gokul Soundararajan, Jin Chen, and Cristiana Amza. 2007. Adaptive learning of metric correlations for temperature-aware database provisioning. In Proc. of the International Conference on Autonomic Computing (ICAC’07).
[20]
Jim Gray, Pat Helland, Patrick O’Neil, and Dennis Shasha. 1996. The dangers of replication and a solution. In Proc. of the ACM SIGMOD International Conference on Management of Data.
[21]
Leonard Kleinrock. 1975. Queueing Systems. Vol. I: Theory. Wiley Interscience.
[22]
Avinash Lakshman and Prashant Malik. 2010. Cassandra: A decentralized structured storage system. SIGOPS Operating System Review, 44, 2 (2010), 35--41.
[23]
Scott T. Leutenegger and Daniel Dias. 1993. A Modeling study of the TPC-C benchmark. In SIGMOD Record 22, 2, 22--31.
[24]
John Dutton Conant Little. 1961. A proof for the queuing formula: L = λ W. Operations Research 9, 3 (1961), 383--387.
[25]
London’s Global University. 2013. Lattice Monitoring Framework. Available at http://clayfour.ee.ucl.ac.uk/lattice/.
[26]
Francesco Marchioni and Manik Surtani. 2012. Infinispan Data Grid Platform. Packt Publishing.
[27]
Daniel A. Menascé and Tatuo Nakanishi. 1982. Performance evaluation of a two-phase commit based protocol for DDBs. In Proc. of the ACM SIGACT-SIGMOD Symposium on Principles of Database Systems (PODS).
[28]
Matthias Nicola and Matthias Jarke. 2000. Performance modeling of distributed and replicated databases. IEEE Transaction on Knowledge and Data Engineering 12, 4 (2000), 645--672.
[29]
Oracle. 2011. Oracle Coherence. Available at http://www.oracle.com/technetwork/middleware/coherence/overview/index.html.
[30]
Roberto Palmieri, Pierangelo di Sanzo, Francesco Quaglia, Paolo Romano, Sebastiano Peluso, and Diego Didona. 2011. Integrated monitoring of infrastructures and applications in cloud environments. In Proc. of the 2011 international conference on Parallel Processing (Euro-Par’11).
[31]
Francisco Perez-Sorrosal, Marta Patiño Martinez, Ricardo Jimenez-Peris, and Bettina Kemme. 2011. Elastic SI-Cache: Consistent and scalable caching in multi-tier architectures. VLDB Journal 20, 6 (2011), 841--865.
[32]
John Ross Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann.
[33]
John Ross Quinlan. 2012. Rulequest Cubist. Available at http://www.rulequest.com/cubist-info.html.
[34]
Jing Fei Ren, Yutaka Tokahashi, and Toshiharu Hasegawa. 1996. Analysis of impact of network delay on multiversion conservative timestamp algorithms in DDBS. Performance Evaluation 26, 1 (1996), 21--50.
[35]
Upendra Sharma, Prashant Shenoy, and Donald F. Towsley. 2012. Provisioning multi-tier cloud applications using statistical bounds on sojourn time. In Proc. of the International Conference on Autonomic Computing (ICAC’12).
[36]
Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, and John Wilkes. 2011. CloudScale: Elastic resource scaling for multi-tenant cloud systems. In Proc. of the ACM Symposium on Cloud Computing (SOCC’11).
[37]
Rahul Singh, Upendra Sharma, Emmanuel Cecchet, and Prashant Shenoy. 2010. Autonomic mix-aware provisioning for non-stationary data center workloads. In Proc. of the International conference on Autonomic computing (ICAC’10).
[38]
Yong Chiang Tay, Nathan Goodman, and Rajan Suri. 1985. Locking performance in centralized databases. ACM Transactions on Database Systems 10, 4 (1985), 415--462.
[39]
Alexander Thomasian. 1998. Concurrency control: Methods, performance, and analysis. ACM Computing Surveys 30, 1 (1998), 70--119.
[40]
Steven K. Thompson. 2002. Sampling (3rd ed.). Wiley Desktop Editions.
[41]
TPC Council. 2011. TPC-C Benchmark. Available at http://www.tpc.org/tpcc.
[42]
Beth Trushkowsky, Peter Bodík, Armando Fox, Michael J. Franklin, Michael I. Jordan, and David A. Patterson. 2011. The SCADS director: Scaling a distributed storage system under stringent performance requirements. In Proc. of the Conference on File and Storage Technologies (FAST’11).
[43]
Bhuvan Urgaonkar, Giovanni Pacifici, Prashant Shenoy, Mike Spreitzer, and Asser Tantawi. 2005. An analytical model for multi-tier internet services and its applications. SIGMETRICS Performance Evaluation Review 33, 1 (June 2005) 291--302.
[44]
Chris Watkins and Peter Dayan. 1992. Q-learning. Machine Learning 8, 3 (1992), 279--292.
[45]
Greg Welch and Gary Bishop. 1995. An Introduction to the Kalman Filter. Technical Report 95-041. Department of Computer Science, University of North Carolina at Chapel Hill.
[46]
Jing Xu, Ming Zhao, José A. B. Fortes, Robert Carpenter, and Mazin S. Yousif. 2007. On the use of fuzzy modeling in virtualized data center management. In Proc. of the International Conference on Autonomic Computing (ICAC’07).
[47]
Philip S. Yu, Daniel M. Dias, and Stephen S. Lavenberg. 1993. On the analytical modeling of database concurrency control. Journal of the ACM (JACM) 40, 4 (1993), 841--872.
[48]
Bin Zhang and Meichun Hsu. 1995. Modeling performance impact of hot spots. In Performance of Concurrency Control Mechanisms in Centralized Database Systems., Vijay Kumar (Ed.), Prentice-Hall, 148--165.
[49]
Qi Zhang, Ludmila Cherkasova, and Evgenia Smirni. 2007. A regression-based analytic model for dynamic resource provisioning of multi-tier applications. In Proc. of the International Conference on Autonomic Computing (ICAC).
[50]
Benjamin Zhu, Kai Li, and Hugo Patterson. 2008. Avoiding the disk bottleneck in the data domain deduplication file system. In Proc. of the Conference on File and Storage Technologies (FAST).

Cited By

View all
  • (2021)A Hybrid Machine Learning Approach for Performance Modeling of Cloud-Based Big Data ApplicationsThe Computer Journal10.1093/comjnl/bxab13165:12(3123-3140)Online publication date: 20-Sep-2021
  • (2020)Taming the Contention in Consensus-based Distributed SystemsIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2020.2970186(1-1)Online publication date: 2020
  • (2019)Learning with Analytical Models2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2019.00128(778-786)Online publication date: May-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Autonomous and Adaptive Systems
ACM Transactions on Autonomous and Adaptive Systems  Volume 9, Issue 2
July 2014
146 pages
ISSN:1556-4665
EISSN:1556-4703
DOI:10.1145/2642710
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 July 2014
Accepted: 01 March 2014
Revised: 01 October 2013
Received: 01 May 2013
Published in TAAS Volume 9, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Transactional data grids
  2. elastic scaling
  3. machine learning
  4. performance forecasting
  5. queueing theory

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)2
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2021)A Hybrid Machine Learning Approach for Performance Modeling of Cloud-Based Big Data ApplicationsThe Computer Journal10.1093/comjnl/bxab13165:12(3123-3140)Online publication date: 20-Sep-2021
  • (2020)Taming the Contention in Consensus-based Distributed SystemsIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2020.2970186(1-1)Online publication date: 2020
  • (2019)Learning with Analytical Models2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2019.00128(778-786)Online publication date: May-2019
  • (2018)LPaaS as Micro-Intelligence: Enhancing IoT with Symbolic ReasoningBig Data and Cognitive Computing10.3390/bdcc20300232:3(23)Online publication date: 3-Aug-2018
  • (2018)Model-Based Proactive Read-Validation in Transaction Processing Systems2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/PADSW.2018.8644605(481-488)Online publication date: Dec-2018
  • (2017)State-Machine and Deferred-Update ReplicationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.259042228:3(891-904)Online publication date: 1-Mar-2017
  • (2017)An Analytical Model of Hardware Transactional Memory2017 IEEE 25th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)10.1109/MASCOTS.2017.29(221-231)Online publication date: Sep-2017
  • (2016)A Combined Analytical Modeling Machine Learning Approach for Performance Prediction of MapReduce Jobs in Cloud Environment2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)10.1109/SYNASC.2016.072(431-439)Online publication date: Sep-2016
  • (2015)QoS-Aware Autonomic Resource Management in Cloud ComputingACM Computing Surveys10.1145/284388948:3(1-46)Online publication date: 22-Dec-2015
  • (2015)Hybrid Machine Learning/Analytical Models for Performance PredictionProceedings of the 6th ACM/SPEC International Conference on Performance Engineering10.1145/2668930.2688823(341-344)Online publication date: 28-Jan-2015
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media