research-article

RepliSmart: A Smart Replication framework for optimal query throughput in read-heavy environments

Authors:
R. K. N. Sai Krishna

Teradata India, Hyderabad, India

Teradata India, Hyderabad, India
View Profile

,
Chandrasekhar Tekur

Teradata India, Hyderabad, India

Teradata India, Hyderabad, India
View Profile

,
Arnab Phani

Teradata India, Hyderabad, India

Teradata India, Hyderabad, India
View Profile

CODS-COMAD '19: Proceedings of the ACM India Joint International Conference on Data Science and Management of DataJanuary 2019Pages 164–170https://doi.org/10.1145/3297001.3297022

Published:03 January 2019Publication History

CODS-COMAD '19: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data

Pages 164–170

ABSTRACT

Replication of data in the context of databases is a way to improve the performance of queries (throughput). An ecosystem where data is replicated can also result in increased parallelism. With replicated data, there would be better fault tolerance. In some cases, replicating a set of data only in few nodes for higher efficiency (in terms of space), could be a choice. A particular set of data could be replicated in many nodes while others in only few, based on the access ratio of the data. Today, the decision of what data to be replicated on which all nodes, is taken based on few presumptions at the time of replication. Once the data is replicated, it remains in those nodes. Over a period of time, the requirements/queries accessing a set of data might change, and it may happen that the data that is less replicated might be the most desired, and vice versa.

Another aspect to be considered is the storage format of the replicas. From the data storage perspective, columnar database could be a great choice for some applications, whereas row based option could be a better bid for another set of applications. Storing all the replicas in either of the storage formats would be inefficient. In this paper, we propose a framework, RepliSmart, in which there is a smart controller that redirects the incoming queries appropriately among the nodes connected, to balance the workload. The framework employs learning based on-demand replication, where in the number of replicas corresponding to a data unit (at a table or database level) vary as the data access patterns vary over a period. Additionally, the smart controller would dynamically define the storage format of a replica such that few of the replicas could be in columnar whereas the remaining in row based storage. The smart controller would redirect any of the user's requests to appropriate nodes based on the decision whether a query could be better executed on columnar data or row based. The proposed framework results in higher query throughput, and better space utilization for read-heavy query workloads.

References

Daniel J. Abadi, Peter A. Boncz, and Stavros Harizopoulos. 2009. Column-oriented database systems. Proc. VLDB Endow. 2, 2 (August 2009), 1664--1665. Google ScholarDigital Library
Gheorghe MATEI, 2010. "Column-Oriented Databases, an Alternative for Analytical Environment," Database Systems Journal, Academy of Economic Studies - Bucharest, Romania, vol. 1(2), pages 3--16, December.Google Scholar
D. Abadi, P. Boncz, S. Harizopoulos, S. Idreos, and S. Madden, "The Design and Implementation of Modern Column-Oriented Database Systems," Foundations and Trends in Databases, vol. 5, no. 3, pp. 197--280, 2013. Google ScholarDigital Library
Wu Qiyue, "Research on column-store databases optimization techniques," 2015 International Conference on Logistics, Informatics and Service Sciences (LISS), Barcelona, 2015, pp. 1--7.Google Scholar
David Loshin, "Gaining the Performance Edge Using a Column-Oriented Database Management System", Analytics in the Federal Government, White paper series on how to achieve efficiency, responsiveness and transparency, January 2010.Google Scholar
https://in.teradata.com/Resources/White-Papers/Teradata-Intelligent-Memory.Google Scholar
J. J. Levandoski, P. Larson and R. Stoica, "Identifying hot and cold data in main-memory databases," 2013 IEEE 29th International Conference on Data Engineering (ICDE), Brisbane, QLD, 2013, pp. 26--37. Google ScholarDigital Library
K. Kim, S. Jung and Y. H. Song, "Compression ratio based hot/cold data identification for flash memory," 2011 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, 2011, pp. 33--34.Google Scholar
S. Elnaffar, P. Martin, and R. Horman, "Automatically classifying database workloads", International Conference on Information and Knowledge Management(CIKM), pp. 622--624, 2002. Google ScholarDigital Library
Bettina Kemme and Gustavo Alonso. 2000. A new approach to developing and implementing eager database replication protocols. ACM Trans. Database Syst. 25, 3 (September 2000), 333--379. Google ScholarDigital Library
Makpangou, Mesaac. (2009). P2P based hosting system for scalable replicated databases. 47--54. Google ScholarDigital Library
Said Elnaffar, Pat Martin, Randy Horman, "Automatically Classifying Database Workloads", International Conference on Information and Knowledge Management(CIKM), November 4-9, 2002 Google ScholarDigital Library
Javier García-García and Carlos Ordonez. 2009. Consistency-aware evaluation of OLAP queries in replicated data warehouses. In Proceedings of the ACM twelfth international workshop on Data warehousing and OLAP (DOLAP '09). ACM, New York, NY, USA, 73--80. Google ScholarDigital Library
Haifeng Yu and Amin Vahdat. 2006. The costs and limits of availability for replicated services. ACM Trans. Comput. Syst. 24, 1 (February 2006), 70--113. Google ScholarDigital Library
Yi Lin, Bettina Kemme, Ricardo Jiménez-Peris, Marta Patiño-Martínez, and José Enrique Armendáriz-Iñigo. 2009. Snapshot isolation and integrity constraints in replicated databases. ACM Trans. Database Syst. 34, 2, Article 11 (July 2009), 49 pages. Google ScholarDigital Library
V. Bhagat and A. Gopal, "Comparative Study of Row and Column Oriented Database," 2012 Fifth International Conference on Emerging Trends in Engineering and Technology, Himeji, 2012, pp. 196--201. Google ScholarDigital Library
A. Kamal and S. C. Gupta, "Query based performance analysis of row and column storage data warehouse," 2014 9th International Conference on Industrial and Information Systems (ICIIS), Gwalior, 2014, pp. 1--6.Google Scholar
Mike Stonebraker, Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Elizabeth O'Neil, Pat O'Neil, Alex Rasin, Nga Tran, and Stan Zdonik. 2005. C-store: a column-oriented DBMS. In Proceedings of the 31st international conference on Very large data bases (VLDB '05). VLDB Endowment 553--564. Google ScholarDigital Library
A. S. Kanade and A. Gopal, "Choosing right database system: Row or column-store," 2013 International Conference on Information Communication and Embedded Systems (ICICES), Chennai, 2013, pp. 16--20.Google Scholar
Jongsung Lee and Jin-Soo Kim. 2013. An empirical study of hot/cold data separation policies in solid state drives (SSDs). In Proceedings of the 6th International Systems and Storage Conference (SYSTOR '13). ACM, New York, NY, USA, Article 12, 6 pages. Google ScholarDigital Library
D. Park and D. H. C. Du, "Hot data identification for flash-based storage systems using multiple bloom filters," 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST), Denver, CO, 2011, pp. 1--11. Google ScholarDigital Library
Chen J., Deng Y., Huang Z. (2015) HDCat: Effectively Identifying Hot Data in Large-Scale I/O Streams with Enhanced Temporal Locality. In: Wang G., Zomaya A., Martinez G., Li K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2015. Lecture Notes in Computer Science, vol 9529. Springer, Cham Google ScholarDigital Library
Sándor Héman, Marcin Zukowski, Niels J. Nes, Lefteris Sidirourgos, and Peter Boncz. 2010. Positional update handling in column stores. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data (SIGMOD '10). ACM, New York, NY, USA, 543--554. Google ScholarDigital Library
https://docs.teradata.com/reader/vLlhnTq8biC8lbWbMR3PBA/GNVVgCfo5Bb2qQvRUftASwGoogle Scholar

Index Terms

RepliSmart: A Smart Replication framework for optimal query throughput in read-heavy environments
1. Information systems
  1. Data management systems
    1. Database management system engines

Recommendations

PHFS: A dynamic replication method, to decrease access latency in the multi-tier data grid

Data replication is a method to improve the performance of data access in distributed systems. Dynamic replication is a kind of replication that adapts replication configuration with the change of users' behavior during the time to ensure the benefits ...
Read More
Dynamic replica placement and selection strategies in data grids- A comprehensive survey

Data replication techniques are used in data grid to reduce makespan, storage consumption, access latency and network bandwidth. Data replication enhances data availability and thereby increases the system reliability. There are two steps involved in ...
Read More
Coarse-grain replica management strategies for dynamic replication of web contents
Special issue on The global Internet

This paper discusses replica management strategies for cost-effective, scalable Web content distribution. In terms of the granularity of replica contents, current dynamic replication approaches can be classified into entire replication (entire content ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CODS-COMAD '19: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data
January 2019
380 pages
ISBN:9781450362078
DOI:10.1145/3297001
General Chairs:
Lipika Dey
TCS Innovation Labs
,
Surajit Chaudhury
Microsoft Research
,
Program Chairs:
Raghu Krishnapuram
Robert Bosch Center, IISc Bangalore
,
Parag Singla
IIT Delhi
,
Publications Chair:
Rishiraj Saha Roy
Max Planck Institute for Informatics
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 January 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Columnar storage
data synchronization
databases
dynamic replication
load balancing
query routing
query throughput
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
CODS-COMAD '19 Paper Acceptance Rate62of198submissions,31%Overall Acceptance Rate197of680submissions,29%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 70
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

RepliSmart: A Smart Replication framework for optimal query throughput in read-heavy environments

CODS-COMAD '19: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

PHFS: A dynamic replication method, to decrease access latency in the multi-tier data grid

Dynamic replica placement and selection strategies in data grids- A comprehensive survey

Coarse-grain replica management strategies for dynamic replication of web contents

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

RepliSmart: A Smart Replication framework for optimal query throughput in read-heavy environments

CODS-COMAD '19: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

PHFS: A dynamic replication method, to decrease access latency in the multi-tier data grid

Dynamic replica placement and selection strategies in data grids- A comprehensive survey

Coarse-grain replica management strategies for dynamic replication of web contents

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media