skip to main content
10.1145/3297001.3297022acmotherconferencesArticle/Chapter ViewAbstractPublication PagescodsConference Proceedingsconference-collections
research-article

RepliSmart: A Smart Replication framework for optimal query throughput in read-heavy environments

Published:03 January 2019Publication History

ABSTRACT

Replication of data in the context of databases is a way to improve the performance of queries (throughput). An ecosystem where data is replicated can also result in increased parallelism. With replicated data, there would be better fault tolerance. In some cases, replicating a set of data only in few nodes for higher efficiency (in terms of space), could be a choice. A particular set of data could be replicated in many nodes while others in only few, based on the access ratio of the data. Today, the decision of what data to be replicated on which all nodes, is taken based on few presumptions at the time of replication. Once the data is replicated, it remains in those nodes. Over a period of time, the requirements/queries accessing a set of data might change, and it may happen that the data that is less replicated might be the most desired, and vice versa.

Another aspect to be considered is the storage format of the replicas. From the data storage perspective, columnar database could be a great choice for some applications, whereas row based option could be a better bid for another set of applications. Storing all the replicas in either of the storage formats would be inefficient. In this paper, we propose a framework, RepliSmart, in which there is a smart controller that redirects the incoming queries appropriately among the nodes connected, to balance the workload. The framework employs learning based on-demand replication, where in the number of replicas corresponding to a data unit (at a table or database level) vary as the data access patterns vary over a period. Additionally, the smart controller would dynamically define the storage format of a replica such that few of the replicas could be in columnar whereas the remaining in row based storage. The smart controller would redirect any of the user's requests to appropriate nodes based on the decision whether a query could be better executed on columnar data or row based. The proposed framework results in higher query throughput, and better space utilization for read-heavy query workloads.

References

  1. Daniel J. Abadi, Peter A. Boncz, and Stavros Harizopoulos. 2009. Column-oriented database systems. Proc. VLDB Endow. 2, 2 (August 2009), 1664--1665. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Gheorghe MATEI, 2010. "Column-Oriented Databases, an Alternative for Analytical Environment," Database Systems Journal, Academy of Economic Studies - Bucharest, Romania, vol. 1(2), pages 3--16, December.Google ScholarGoogle Scholar
  3. D. Abadi, P. Boncz, S. Harizopoulos, S. Idreos, and S. Madden, "The Design and Implementation of Modern Column-Oriented Database Systems," Foundations and Trends in Databases, vol. 5, no. 3, pp. 197--280, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Wu Qiyue, "Research on column-store databases optimization techniques," 2015 International Conference on Logistics, Informatics and Service Sciences (LISS), Barcelona, 2015, pp. 1--7.Google ScholarGoogle Scholar
  5. David Loshin, "Gaining the Performance Edge Using a Column-Oriented Database Management System", Analytics in the Federal Government, White paper series on how to achieve efficiency, responsiveness and transparency, January 2010.Google ScholarGoogle Scholar
  6. https://in.teradata.com/Resources/White-Papers/Teradata-Intelligent-Memory.Google ScholarGoogle Scholar
  7. J. J. Levandoski, P. Larson and R. Stoica, "Identifying hot and cold data in main-memory databases," 2013 IEEE 29th International Conference on Data Engineering (ICDE), Brisbane, QLD, 2013, pp. 26--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. Kim, S. Jung and Y. H. Song, "Compression ratio based hot/cold data identification for flash memory," 2011 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, 2011, pp. 33--34.Google ScholarGoogle Scholar
  9. S. Elnaffar, P. Martin, and R. Horman, "Automatically classifying database workloads", International Conference on Information and Knowledge Management(CIKM), pp. 622--624, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Bettina Kemme and Gustavo Alonso. 2000. A new approach to developing and implementing eager database replication protocols. ACM Trans. Database Syst. 25, 3 (September 2000), 333--379. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Makpangou, Mesaac. (2009). P2P based hosting system for scalable replicated databases. 47--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Said Elnaffar, Pat Martin, Randy Horman, "Automatically Classifying Database Workloads", International Conference on Information and Knowledge Management(CIKM), November 4-9, 2002 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Javier García-García and Carlos Ordonez. 2009. Consistency-aware evaluation of OLAP queries in replicated data warehouses. In Proceedings of the ACM twelfth international workshop on Data warehousing and OLAP (DOLAP '09). ACM, New York, NY, USA, 73--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Haifeng Yu and Amin Vahdat. 2006. The costs and limits of availability for replicated services. ACM Trans. Comput. Syst. 24, 1 (February 2006), 70--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Yi Lin, Bettina Kemme, Ricardo Jiménez-Peris, Marta Patiño-Martínez, and José Enrique Armendáriz-Iñigo. 2009. Snapshot isolation and integrity constraints in replicated databases. ACM Trans. Database Syst. 34, 2, Article 11 (July 2009), 49 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. V. Bhagat and A. Gopal, "Comparative Study of Row and Column Oriented Database," 2012 Fifth International Conference on Emerging Trends in Engineering and Technology, Himeji, 2012, pp. 196--201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Kamal and S. C. Gupta, "Query based performance analysis of row and column storage data warehouse," 2014 9th International Conference on Industrial and Information Systems (ICIIS), Gwalior, 2014, pp. 1--6.Google ScholarGoogle Scholar
  18. Mike Stonebraker, Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Elizabeth O'Neil, Pat O'Neil, Alex Rasin, Nga Tran, and Stan Zdonik. 2005. C-store: a column-oriented DBMS. In Proceedings of the 31st international conference on Very large data bases (VLDB '05). VLDB Endowment 553--564. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. S. Kanade and A. Gopal, "Choosing right database system: Row or column-store," 2013 International Conference on Information Communication and Embedded Systems (ICICES), Chennai, 2013, pp. 16--20.Google ScholarGoogle Scholar
  20. Jongsung Lee and Jin-Soo Kim. 2013. An empirical study of hot/cold data separation policies in solid state drives (SSDs). In Proceedings of the 6th International Systems and Storage Conference (SYSTOR '13). ACM, New York, NY, USA, Article 12, 6 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Park and D. H. C. Du, "Hot data identification for flash-based storage systems using multiple bloom filters," 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST), Denver, CO, 2011, pp. 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Chen J., Deng Y., Huang Z. (2015) HDCat: Effectively Identifying Hot Data in Large-Scale I/O Streams with Enhanced Temporal Locality. In: Wang G., Zomaya A., Martinez G., Li K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2015. Lecture Notes in Computer Science, vol 9529. Springer, Cham Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Sándor Héman, Marcin Zukowski, Niels J. Nes, Lefteris Sidirourgos, and Peter Boncz. 2010. Positional update handling in column stores. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data (SIGMOD '10). ACM, New York, NY, USA, 543--554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. https://docs.teradata.com/reader/vLlhnTq8biC8lbWbMR3PBA/GNVVgCfo5Bb2qQvRUftASwGoogle ScholarGoogle Scholar

Index Terms

  1. RepliSmart: A Smart Replication framework for optimal query throughput in read-heavy environments

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      CODS-COMAD '19: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data
      January 2019
      380 pages
      ISBN:9781450362078
      DOI:10.1145/3297001

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 January 2019

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      CODS-COMAD '19 Paper Acceptance Rate62of198submissions,31%Overall Acceptance Rate197of680submissions,29%
    • Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader