MiNT-OLAP cluster: minimizing network transmission cost in OLAP cluster for main memory analytical database

Jiao, Min; Zhang, Yansong; Wang, Zhanwei; Wang, Shan

doi:10.1007/s11704-012-1080-8

MiNT-OLAP cluster: minimizing network transmission cost in OLAP cluster for main memory analytical database

Research Article
Published: 04 December 2012

Volume 6, pages 668–676, (2012)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Min Jiao^1,3,
Yansong Zhang²,
Zhanwei Wang^1,3 &
…
Shan Wang^1,3

125 Accesses
Explore all metrics

Abstract

Powerful storage, high performance and scalability are the most important issues for analytical databases. These three factors interact with each other, for example, powerful storage needs less scalability but higher performance, high performance means less consumption of indexes and other materializations for storage and fewer processing nodes, larger scale relieves stress on powerful storage and the high performance processing engine. Some analytical databases (ParAccel, Teradata) bind their performance with advanced hardware supports, some (Asterdata, Greenplum) rely on the high scalability framework of MapReduce, some (MonetDB, Sybase IQ, Vertica) highlight performance on processing engine and storage engine. All these approaches can be integrated into an storage-performance-scalability (S-P-S) model, and future large scale analytical processing can be built on moderate clusters to minimize expensive hardware dependency. The most important thing is a simple software framework is fundamental to maintain pace with the development of hardware technologies. In this paper, we propose a schema-aware on-line analytical processing (OLAP) model with deep optimization from native features of the star or snowflake schema. The OLAP model divides the whole process into several stages, each stage pipes its output to the next stage, we minimize the size of output data in each stage, whether in central processing or clustered processing. We extend this mechanism to cluster processing using two major techniques, one is using NetMemory as a broadcasting protocol based dimension mirror synchronizing buffer, the other is predicate-vector based DDTA-OLAP cluster model which can minimize the data dependency of star-join using bitmap vectors. Our OLAP model aims to minimize network transmission cost (MiNT in short) for OLAP clusters and support a scalable but simple distributed storagemodel for large scale clustering processing. Finally, the experimental results show the speedup and scalability performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Data Mining Approach to Guide the Physical Design of Distributed Big Data Warehouses

Chabok: a Map-Reduce based method to solve data warehouse problems

Article Open access 26 October 2018

FlexpushdownDB: rethinking computation pushdown for cloud OLAP DBMSs

Article 10 July 2024

References

MacNicol R, French B. Sybase IQ multiplex-designed for analyticals. In: Proceedings of VLDB. 2004
Stonebraker M, Abadi D J, Batkin A, Chen X D, Cherniack M, Ferreira M, Lau E, Lin A, Madden S, O’Neil E J, O’Neil P E, Rasin A, Tran N, Zdonik S B. C-store: a column-oriented DBMS. In: Proceedings of VLDB. 2005, 553–564
Boncz P A, Mangegold S, Kersten M L. Database architecture optimized for the new bottleneck: memory access. In: Proceedings of VLDB. 1999, 266–277
Abadi D J. Tradeoffs between parallel database systems, hadoop, and hadoopDB as platforms for petabyte-scale analysis. In: Proceedings of SSDBM. 2010, 1–3
Abouzeid A, Bajda-Pawlikowski K, Abadi D J, Rasin A, Silberschatz A. HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proceedings of the VLDB Endowment, 2009, 2(1): 922–933
Google Scholar
Zhang Y S, Hu W, Wang S. MOSS-DB: a hardware-aware OLAP database. In: Proceedings of WAIM. 2010, 582–594
O’Neil P, O’Neil B, Chen X D. The star schema benchmark (SSB). http://www.cs.umb.edu/?poneil/StarSchemaB.PDF
Li J Z, Srivastava J, Rotem D. CMD: a multidimensional declustering method for parallel data systems. In: Proceedings of VLDB. 1992, 3–14
Lima A A B, Furtado C, Valduriez P, Mattoso M. Parallel OLAP query processing in database clusters with data replication. Distributed and Parallel Databases, 2005, 25: 97–123
Article Google Scholar
Furtado P. Model and procedure for performance and availability-wise parallel warehouses. Distributed and Parallel Databases, 2009, 25(1): 71–96
Article Google Scholar
Abouzeid A, Bajda-Pawlikowski K, Abadi D J, Rasin A, Silberschatz A. HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proceedings of the VLDB Endowment, 2009, 2(1): 922–933
Google Scholar
Yang C, Yen C, Tan C, Madden S. Osprey: implementing MapReducestyle fault tolerance in a shared-nothing distributed database. In: Proceedings of ICDE. 2010, 657–668
Chen S. Cheetah: a high performance, custom data warehouse on top of MapReduce. Proceedings of the VLDB Endowment, 2010, 3(2): 1459–1468
Google Scholar
Winter Corporation White Paper. SAP NetWeaver: a complete platform for large-scale business intelligence. 2005
DeWitt D J, Gerber R H, Graefe G, Heytens M L, Kumar K B, Muralikrishna M. GAMMA-A high performance dataflow database machine. In: Proceedings of VLDB. 1986, 228–237
Fushimi S, Kitsuregawa M, Tanaka H. An overview of the system software of a parallel relational database machine. In: Proceedings of VLDB. 1986, 209–219
DeWitt D J, Gerber R H. Multiprocessor hash-based join algorithms. In: Proceedings of VLDB. 1985, 151–164
Candea G, Polyzotis N, Vingralek R. A scalable, predictable join operator for highly concurrent data warehouse. Proceedings of the VLDB Endowment, 2009, 2(1): 277–288
Google Scholar

Download references

Author information

Authors and Affiliations

DEKE Lab, Renmin University of China, Beijing, 100872, China
Min Jiao, Zhanwei Wang & Shan Wang
National Survey Research Center, Renmin University of China, Beijing, 100872, China
Yansong Zhang
School of Information, Renmin University of China, Beijing, 100872, China
Min Jiao, Zhanwei Wang & Shan Wang

Authors

Min Jiao
View author publications
You can also search for this author inPubMed Google Scholar
Yansong Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Zhanwei Wang
View author publications
You can also search for this author inPubMed Google Scholar
Shan Wang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Min Jiao.

Additional information

Min Jiao is a PhD candidate in the School of Information, Renmin University of China. Her research interests include main memory databases, OLAP, and high performance databases.

Yansong Zhang is a lecturer teacher in the School of Information, Renmin University of China. His current research interests include main memory databases, OLAP, and high performance databases.

Zhanwei Wang is a master’s student in Renmin University of China. His research interests include memory databases, OLAP, and high performance databases.

Shan Wang is a professor and PhD supervisor in the School of Information, Renmin University of China. She is a senior member of the China Computer Federation. Her research interests include high performance databases, data warehouses, knowledge engineering, and information retrieval.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiao, M., Zhang, Y., Wang, Z. et al. MiNT-OLAP cluster: minimizing network transmission cost in OLAP cluster for main memory analytical database. Front. Comput. Sci. 6, 668–676 (2012). https://doi.org/10.1007/s11704-012-1080-8

Download citation

Received: 24 June 2011
Accepted: 16 August 2012
Published: 04 December 2012
Issue Date: December 2012
DOI: https://doi.org/10.1007/s11704-012-1080-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MiNT-OLAP cluster: minimizing network transmission cost in OLAP cluster for main memory analytical database

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Data Mining Approach to Guide the Physical Design of Distributed Big Data Warehouses

Chabok: a Map-Reduce based method to solve data warehouse problems

FlexpushdownDB: rethinking computation pushdown for cloud OLAP DBMSs

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now