Heterogeneous Replicas for Multi-dimensional Data Management

Qiao, Jialin; Kang, Yuyuan; Huang, Xiangdong; Rui, Lei; Jiang, Tian; Wang, Jianmin; Yu, Philip S.

doi:10.1007/978-3-030-59410-7_2

Jialin Qiao^14,15,
Yuyuan Kang^14,15,
Xiangdong Huang^14,15,
Lei Rui^14,15,
Tian Jiang^14,15,
Jianmin Wang^14,15 &
…
Philip S. Yu¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12112))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

3205 Accesses

Abstract

Multi-dimensional data is widely used in different scenarios, such as cluster monitoring and user behavior analysis for web services. The data is usually managed by distributed databases with a replication strategy, which enhances the availability, fault-tolerance, and I/O throughput. Normally, these replicas share the same physical layout on the disk, which is designed by database administrators according to the target workload. However, it is critical to derive an optimal layout that benefits as many queries as possible, because a layout that accommodates only some queries can negatively impact the others. To tackle this limitation, we propose heterogeneous replicas for multi-dimensional data that provide a higher query throughput without additional disk occupation and without slowing down the writing speed, while still ensuring high availability and load balance. The proposed replication method allows different replicas to be logically identical while having different physical data layouts on the disk. We verified the efficiency of our method in a NoSQL system, Cassandra, with the TPC-H dataset and with a synthetically generated dataset. The results show that our method outperforms state-of-the-art solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Benchmarking Replication in Cassandra and MongoDB NoSQL Datastores

Impacts of data consistency levels in cloud-based NoSQL for data-intensive applications

Article Open access 27 November 2024

HMVR-tree: A Multi-version R-tree Based on HBase for Concurrent Access

Notes

1.
https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlshTracing.html.

References

Ailamaki, A., DeWitt, D.J., Hill, M.D., Skounakis, M.: Weaving relations for cache performance. VLDB 1, 169–180 (2001)
Google Scholar
Bian, H., Yan: Wide table layout optimization based on column ordering and duplication. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 299–314. ACM (2017)
Google Scholar
Borthakur, D., et al.: HDFS architecture guide. Hadoop Apache Project 53 (2008)
Google Scholar
Consens, M.P., Ioannidou, K., LeFevre, J., Polyzotis, N.: Divergent physical design tuning for replicated databases. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 49–60. ACM (2012)
Google Scholar
Copeland, G.P., Khoshafian, S.: A decomposition storage model. In: SIGMOD Conference (1985)
Google Scholar
Grund, M., Krüger, J., Plattner, H., Zeier, A., Cudré-Mauroux, P., Madden, S.: Hyrise - a main memory hybrid storage engine. PVLDB 4, 105–116 (2010)
Google Scholar
Jindal, A., Quiané-Ruiz, J.A., Dittrich, J.: Trojan data layouts: right shoes for a running elephant. In: Proceedings of the 2nd ACM Symposium on Cloud Computing, p. 21. ACM (2011)
Google Scholar
Jouini, K.: Distorted replicas: intelligent replication schemes to boost I/O throughput in document-stores. In: 2017 IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA), pp. 25–32 (2017)
Google Scholar
Kirkpatrick, S., Gelatt, D., Vecchi, M.P.: Optimization by simulated annealing. Science 220, 671–680 (1983)
Article MathSciNet Google Scholar
Lamb, A., et al.: The vertica analytic database: C-store 7 years later. PVLDB 5, 1790–1801 (2012)
Google Scholar
Mior, M.J., Salem, K., Aboulnaga, A., Liu, R.: NoSE: schema design for NoSQL applications. IEEE Trans. Knowl. Data Eng. 29(10), 2275–2289 (2017)
Article Google Scholar
Home page P (2018). http://parquet.apache.org/documentation/latest/
Rabl, T., Jacobsen, H.A.: Query centric partitioning and allocation for partially replicated database systems. In: Proceedings of the 2017 ACM International Conference on Management of Data. pp. 315–330. ACM (2017)
Google Scholar
Ramamurthy, R., DeWitt, D.J., Su, Q.: A case for fractured mirrors. VLDB J. 12, 89–101 (2002)
Article Google Scholar
Saccà, D., Wiederhold, G.: Database partitioning in a cluster of processors. ACM Trans. Database Syst. 10, 29–56 (1983)
Article Google Scholar
Staudt, M., Jarke, M.: Incremental maintenance of externally materialized views. In: VLDB (1996)
Google Scholar
Stonebraker, M., et al.: C-store: a column-oriented DBMs. In: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 553–564. VLDB Endowment (2005)
Google Scholar
Tran, Q.T., Jimenez, I., Wang, R., Polyzotis, N., Ailamaki, A.: RITA: an index-tuning advisor for replicated databases. In: Proceedings of the 27th International Conference on Scientific and Statistical Database Management, p. 22. ACM (2015)
Google Scholar
Valentin, G., Zuliani, M., Zilio, D.C., Lohman, G., Skelley, A.: DB2 advisor: an optimizer smart enough to recommend its own indexes. In: Proceedings of 16th International Conference on Data Engineering (Cat. No. 00CB37073), pp. 101–110. IEEE (2000)
Google Scholar
Whitley, D.: A genetic algorithm tutorial (1994)
Google Scholar
Xiang-dong, H., Jian-min, W., Si-han, G., et al.: A storage model for large scale multi-dimension data files. Proc NDBC 1, 48–56 (2014)
Google Scholar
Xu, C., Tang, B., Yiu, M.L.: Diversified caching for replicated web search engines. 2015 IEEE 31st International Conference on Data Engineering, pp. 207–218 (2015)
Google Scholar

Download references

Acknowledgments

The work was supported by the Nature Science Foundation of China (No. 61802224, 71690231), and Beijing Key Laboratory of Industrial Bigdata System and Application. We also thank anonymous reviewers for their valuable comments.

Author information

Authors and Affiliations

KLiss, MOE; BNRist; School of Software, Tsinghua University, Beijing, China
Jialin Qiao, Yuyuan Kang, Xiangdong Huang, Lei Rui, Tian Jiang & Jianmin Wang
Research Center for Big Data, Tsinghua University, Beijing, China
Jialin Qiao, Yuyuan Kang, Xiangdong Huang, Lei Rui, Tian Jiang & Jianmin Wang
University of Illinois, Champaign, IL, USA
Philip S. Yu

Authors

Jialin Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Yuyuan Kang
View author publications
You can also search for this author in PubMed Google Scholar
Xiangdong Huang
View author publications
You can also search for this author in PubMed Google Scholar
Lei Rui
View author publications
You can also search for this author in PubMed Google Scholar
Tian Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Jianmin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Philip S. Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiangdong Huang .

Editor information

Editors and Affiliations

Dankook University, Yongin, Korea (Republic of)
Yunmook Nah
Peking University, Haidian, China
Bin Cui
Sungkyunkwan University, Suwon, Korea (Republic of)
Sang-Won Lee
Department of System Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong, Hong Kong
Jeffrey Xu Yu
Kangwon National University, Chunchon, Korea (Republic of)
Yang-Sae Moon
Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
Steven Euijong Whang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qiao, J. et al. (2020). Heterogeneous Replicas for Multi-dimensional Data Management. In: Nah, Y., Cui, B., Lee, SW., Yu, J.X., Moon, YS., Whang, S.E. (eds) Database Systems for Advanced Applications. DASFAA 2020. Lecture Notes in Computer Science(), vol 12112. Springer, Cham. https://doi.org/10.1007/978-3-030-59410-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-59410-7_2
Published: 18 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59409-1
Online ISBN: 978-3-030-59410-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics