Efficient Aggregation Query Processing for Large-Scale Multidimensional Data by Combining RDB and KVS

Watari, Yuya; Keyaki, Atsushi; Miyazaki, Jun; Nakamura, Masahide

doi:10.1007/978-3-319-98809-2_9

Yuya Watari¹⁸,
Atsushi Keyaki¹⁸,
Jun Miyazaki¹⁸ &
…
Masahide Nakamura¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11029))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

Abstract

This paper presents a highly efficient aggregation query processing method for large-scale multidimensional data. Recent developments in network technologies have led to the generation of a large amount of multidimensional data, such as sensor data. Aggregation queries play an important role in analyzing such data. Although relational databases (RDBs) support efficient aggregation queries with indexes that enable faster query processing, increasing data size may lead to bottlenecks. On the other hand, the use of a distributed key-value store (D-KVS) is key to obtaining scale-out performance for data insertion throughput. However, querying multidimensional data sometimes requires a full data scan owing to its insufficient support for indexes. The proposed method combines an RDB and D-KVS to use their advantages complementarily. In addition, a novel technique is presented wherein data are divided into several subsets called grids, and the aggregated values for each grid are precomputed. This technique improves query processing performance by reducing the amount of scanned data. We evaluated the efficiency of the proposed method by comparing its performance with current state-of-the-art methods and showed that the proposed method performs better than the current ones in terms of query and insertion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Accurate Aggregation Query-Result Estimation and Its Efficient Processing on Distributed Key-Value Store

Scalable and Hierarchical Distributed Data Structures for Efficient Big Data Management

An Efficient Approach for Query Processing of Incomplete High Dimensional Data Streams

Notes

1.
Our implementation uses a custom filter in HBase for a prefix scan in Step 3 of Algorithm 2, which efficiently extracts the data contained within the given query range.
2.
https://github.com/shojinishimura/Tiny-MD-HBase.

References

Codd, E., Codd, S., Salley, C.: Providing OLAP (On-line Analytical Processing) to User-Analysts: An IT Mandate. Codd & Associates (1993)
Google Scholar
Wang, J., Wu, S., Gao, H., Li, J., Ooi, B.C.: Indexing multi-dimensional data in a cloud system. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 591–602. ACM (2010)
Google Scholar
Zhang, X., Ai, J., Wang, Z., Lu, J., Meng, X.: An efficient multi-dimensional index for cloud data management. In: Proceedings of the First International Workshop on Cloud Data Management, pp. 17–24. ACM (2009)
Google Scholar
Li, X., Kim, Y.J., Govindan, R., Hong, W.: Multi-dimensional range queries in sensor networks. In: Proceedings of the 1st International Conference on Embedded Networked Sensor Systems, pp. 63–75. ACM (2003)
Google Scholar
Escriva, R., Wong, B., Sirer, E.G.: Hyperdex: a distributed, searchable key-value store. ACM SIGCOMM Comput. Commun. Rev. 42(4), 25–36 (2012)
Article Google Scholar
Nishimura, S., Das, S., Agrawal, D., El Abbadi, A.: $\cal{MD}$-hbase: design and implementation of an elastic data infrastructure for cloud-scale location services. Distrib. Parallel Databases 31(2), 289–319 (2013)
Article Google Scholar
Codd, E.F.: A relational model of data for large shared data banks. Commun. ACM 13(6), 377–387 (1970)
Article Google Scholar
Lu, H., Tan, K.L., Ooi, B.-C.: Query Processing in Parallel Relational Database Systems. IEEE Computer Society Press, Los Alamitos (1994)
Google Scholar
Özsu, M.T., Valduriez, P.: Principles of Distributed Database Systems. Springer, Heidelberg (2011). https://doi.org/10.1007/978-1-4419-8834-8
Book Google Scholar
Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)
Article Google Scholar
Cooper, B.F., et al.: PNUTS: Yahoo!’s hosted data serving platform. Proc. VLDB Endow. 1(2), 1277–1288 (2008)
Article Google Scholar
Redis: Redis. https://redis.io/
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: Amazon’s highly available key-value store. ACM SIGOPS Oper. Syst. Rev. 41(6), 205–220 (2007)
Article Google Scholar
Morton, G.M.: A computer oriented geodetic data base and a new technique in file sequencing. In: International Business Machines Company New York (1966)
Google Scholar
Hilbert, D.: Ueber die stetige abbildung einer line auf ein flächenstück. Math. Ann. 38(3), 459–460 (1891)
Article MathSciNet Google Scholar
Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, SIGMOD 1984, pp. 47–57. ACM, New York (1984)
Google Scholar
Finkel, R.A., Bentley, J.L.: Quad trees a data structure for retrieval on composite keys. Acta Inf. 4(1), 1–9 (1974)
Article Google Scholar
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Article Google Scholar
Nishimura, S., Yokota, H.: Quilts: multidimensional data partitioning framework based on query-aware and skew-tolerant space-filling curves. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 1525–1537. ACM (2017)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Eldawy, A., Mokbel, M.F.: SpatialHadoop: a MapReduce framework for spatial data. In: 2015 IEEE 31st International Conference on Data Engineering, pp. 1352–1363, April 2015
Google Scholar
Korry Douglas, S.D.: PostgreSQL: A Comprehensive Guide to Building, Programming, and Administering PostgresSQL Databases. Sams Publishing, Indianapolis (2003)
Google Scholar
The Apache Software Foundation: Apache HBase. https://hbase.apache.org/
Brinkhoff, T.: A framework for generating network-based moving objects. GeoInformatica 6(2), 153–180 (2002)
Article Google Scholar

Download references

Acknowledgements

This work was partly supported by JSPS KAKENHI Grant Numbers 15H02701, 16H02908, 17K12684, 18H03242, 18H03342, and ACT-I, JST.

Author information

Authors and Affiliations

Department of Computer Science, School of Computing, Tokyo Institute of Technology, Tokyo, Japan
Yuya Watari, Atsushi Keyaki & Jun Miyazaki
Kobe University, Kobe, Japan
Masahide Nakamura

Authors

Yuya Watari
View author publications
You can also search for this author in PubMed Google Scholar
Atsushi Keyaki
View author publications
You can also search for this author in PubMed Google Scholar
Jun Miyazaki
View author publications
You can also search for this author in PubMed Google Scholar
Masahide Nakamura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Miyazaki .

Editor information

Editors and Affiliations

Clausthal University of Technology, Clausthal-Zellerfeld, Germany
Sven Hartmann
Victoria University of Wellington, Wellington, New Zealand
Hui Ma
Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
University of Regensburg, Regensburg, Germany
Günther Pernul
Johannes Kepler University, Linz, Austria
Roland R. Wagner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Watari, Y., Keyaki, A., Miyazaki, J., Nakamura, M. (2018). Efficient Aggregation Query Processing for Large-Scale Multidimensional Data by Combining RDB and KVS. In: Hartmann, S., Ma, H., Hameurlain, A., Pernul, G., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2018. Lecture Notes in Computer Science(), vol 11029. Springer, Cham. https://doi.org/10.1007/978-3-319-98809-2_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-98809-2_9
Published: 09 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98808-5
Online ISBN: 978-3-319-98809-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics