Abstract
Multi-dimensional data is widely used in different scenarios, such as cluster monitoring and user behavior analysis for web services. The data is usually managed by distributed databases with a replication strategy, which enhances the availability, fault-tolerance, and I/O throughput. Normally, these replicas share the same physical layout on the disk, which is designed by database administrators according to the target workload. However, it is critical to derive an optimal layout that benefits as many queries as possible, because a layout that accommodates only some queries can negatively impact the others. To tackle this limitation, we propose heterogeneous replicas for multi-dimensional data that provide a higher query throughput without additional disk occupation and without slowing down the writing speed, while still ensuring high availability and load balance. The proposed replication method allows different replicas to be logically identical while having different physical data layouts on the disk. We verified the efficiency of our method in a NoSQL system, Cassandra, with the TPC-H dataset and with a synthetically generated dataset. The results show that our method outperforms state-of-the-art solutions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ailamaki, A., DeWitt, D.J., Hill, M.D., Skounakis, M.: Weaving relations for cache performance. VLDB 1, 169–180 (2001)
Bian, H., Yan: Wide table layout optimization based on column ordering and duplication. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 299–314. ACM (2017)
Borthakur, D., et al.: HDFS architecture guide. Hadoop Apache Project 53 (2008)
Consens, M.P., Ioannidou, K., LeFevre, J., Polyzotis, N.: Divergent physical design tuning for replicated databases. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 49–60. ACM (2012)
Copeland, G.P., Khoshafian, S.: A decomposition storage model. In: SIGMOD Conference (1985)
Grund, M., Krüger, J., Plattner, H., Zeier, A., Cudré-Mauroux, P., Madden, S.: Hyrise - a main memory hybrid storage engine. PVLDB 4, 105–116 (2010)
Jindal, A., Quiané-Ruiz, J.A., Dittrich, J.: Trojan data layouts: right shoes for a running elephant. In: Proceedings of the 2nd ACM Symposium on Cloud Computing, p. 21. ACM (2011)
Jouini, K.: Distorted replicas: intelligent replication schemes to boost I/O throughput in document-stores. In: 2017 IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA), pp. 25–32 (2017)
Kirkpatrick, S., Gelatt, D., Vecchi, M.P.: Optimization by simulated annealing. Science 220, 671–680 (1983)
Lamb, A., et al.: The vertica analytic database: C-store 7 years later. PVLDB 5, 1790–1801 (2012)
Mior, M.J., Salem, K., Aboulnaga, A., Liu, R.: NoSE: schema design for NoSQL applications. IEEE Trans. Knowl. Data Eng. 29(10), 2275–2289 (2017)
Home page P (2018). http://parquet.apache.org/documentation/latest/
Rabl, T., Jacobsen, H.A.: Query centric partitioning and allocation for partially replicated database systems. In: Proceedings of the 2017 ACM International Conference on Management of Data. pp. 315–330. ACM (2017)
Ramamurthy, R., DeWitt, D.J., Su, Q.: A case for fractured mirrors. VLDB J. 12, 89–101 (2002)
Saccà , D., Wiederhold, G.: Database partitioning in a cluster of processors. ACM Trans. Database Syst. 10, 29–56 (1983)
Staudt, M., Jarke, M.: Incremental maintenance of externally materialized views. In: VLDB (1996)
Stonebraker, M., et al.: C-store: a column-oriented DBMs. In: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 553–564. VLDB Endowment (2005)
Tran, Q.T., Jimenez, I., Wang, R., Polyzotis, N., Ailamaki, A.: RITA: an index-tuning advisor for replicated databases. In: Proceedings of the 27th International Conference on Scientific and Statistical Database Management, p. 22. ACM (2015)
Valentin, G., Zuliani, M., Zilio, D.C., Lohman, G., Skelley, A.: DB2 advisor: an optimizer smart enough to recommend its own indexes. In: Proceedings of 16th International Conference on Data Engineering (Cat. No. 00CB37073), pp. 101–110. IEEE (2000)
Whitley, D.: A genetic algorithm tutorial (1994)
Xiang-dong, H., Jian-min, W., Si-han, G., et al.: A storage model for large scale multi-dimension data files. Proc NDBC 1, 48–56 (2014)
Xu, C., Tang, B., Yiu, M.L.: Diversified caching for replicated web search engines. 2015 IEEE 31st International Conference on Data Engineering, pp. 207–218 (2015)
Acknowledgments
The work was supported by the Nature Science Foundation of China (No. 61802224, 71690231), and Beijing Key Laboratory of Industrial Bigdata System and Application. We also thank anonymous reviewers for their valuable comments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Qiao, J. et al. (2020). Heterogeneous Replicas for Multi-dimensional Data Management. In: Nah, Y., Cui, B., Lee, SW., Yu, J.X., Moon, YS., Whang, S.E. (eds) Database Systems for Advanced Applications. DASFAA 2020. Lecture Notes in Computer Science(), vol 12112. Springer, Cham. https://doi.org/10.1007/978-3-030-59410-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-59410-7_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59409-1
Online ISBN: 978-3-030-59410-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)